Skip to content

Conversation

ZorinAnton
Copy link
Contributor

@ZorinAnton ZorinAnton commented Jul 8, 2025

As discussed, splitting the PR #428 into dml and ddl parts. This PR is for ddl support.

For the ddl sql statements the following relations are created:

  • NamedWrite for create table as
  • NamedDdl for create view

All other types of sql statements are handled as before.

Since DDL statements have no representation in Calcite's relational algebra, these changes conflict with the proposed split of SqlToSubstrait into SqlToCalcite and CalciteToSubstrait: #362. This is a key architectural decision that requires careful design.

@nielspardon
Copy link
Member

@vbarua can you also take a look at this. I want to ensure the further refactoring to isthmus is somewhat aligned with these changes.

@vbarua vbarua changed the title Add ddl support to SqlToSubstrait feat: support DDL statements in SqlToSubstrait Aug 14, 2025
import org.apache.calcite.sql.util.SqlBasicVisitor;
import org.apache.calcite.sql2rel.SqlToRelConverter;

public class DdlRelBuilder extends SqlBasicVisitor<Plan.Root> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one thing that might be at odds with some of the work I've been doing on the API is that this returns a Plan.Root instead of a RelRoot or RelNode.

What I've been trying to do is modularize the API along two lines, roughly:

  • SQL -> Calcite RelRoot
  • Calcite RelRoot -> Isthmus Plan

The aim of this is to enable the workflows of users who are starting from SQL, and users who are starting directly with Calcite RelNodse, but share that majority of code for them. With the API modularized like this, SqlToSubstrait can effectively become the composition of these 2 conversion APIs. However, because what you're proposing here is effectively going from SQL -> Isthmus POJO directly, it makes that a bit trickier.

Aside from that goal though, a better argument for changing this is that this approach only works when going from SQL to POJO. If we assume that we want to be able to read in Substrait plans that create tables, then we'll need to have a first class entity in Calcite to model this. The built-in LogicalTableModify doesn't work for this, but we can add our own TableCreate relation, along with a LogicalTableCreate. That can be used when going from Sql to Calcit, from Calcite to Isthmus, and also from Isthmus back to Calcite.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the overall approach makes sense. The question is how to design this properly on the calcite's side.
I see the following options:

  1. Use separate class for DDL relations.
    It means we can

    • add the DdlRel class
    • implement required methods for visitors (not sure what's better here - introduce a new abstract visitor class for ddl or extend the RelVisitor)
    • add the DdlPlanRoot class (with the semantic similar to the RelRoot)
      Probably the DdlPlanRoot class should have the same base class with the RelRoot class for uniform handling in Sql->Calcite conversion.
  2. Extend the RelNode class to support DDL. This simplifies the required changes, but I guess it diverges from Calcite conventions and add unnecessary complexity to RelNode

Please let me know your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 makes the most sense to me. I think CREATE is the only operation that we have in Substrait that's not supported as a Calcite relation. I would re-use Calcite's LogicalModify relation for INSERT, DELETE and UPDATE like we are already doing, and the create a relation specifically for CREATE.

implement required methods for visitors

We should be able to add a visit method for our new relation in the Isthmu sRelNodeVisitor utility. The Calcite RelShuttle also has a

RelNode visit(RelNode other);

method for cases like this if we need that.

Would we need a custom RelRoot class, or could we just use the existing RelRoot with the SqlKind set to CREATE_TABLE or OTHER_DDL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess making changes in calcite will take time, what should we do with this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess making changes in calcite will take time,

Ah, I was not suggesting we make the changes in Calcite. We should be able to define our own RelNode in Isthmus with something like:

package io.substrait.isthmus.calcite.relations;

import org.apache.calcite.rel.AbstractRelNode;

public abstract class TableCreate extends AbstractRelNode {
//    ...
}

public class LogicalTableCreate extends TableCreate {
//    ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I thought we need to extend a hierarchy that's a part of calcite. If we can do that in the isthmus code it will be much easier.

Copy link
Member

@vbarua vbarua Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah we can do something much simpler than extending Calcite. I was pointing out the

RelNode visit(RelNode other);

because that's the mechanism by which Calcite let's you visit nodes that you define outside of Calcite core.

Copy link
Contributor Author

@ZorinAnton ZorinAnton Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems with this approach we still have to make DdlRel (Create) a child of RelNode, but I have a couple of concerns with it:

  • RelNode represents a relational expression, that processes data (records), while DdlRel does something else (it modifies the database catalog with a new entry for table, view or mqt)
  • RelNode has the getRowType method which represents the type of the rows returned by the relational expression, which is not applicable for DdlRel

So I'm not sure if it is correct to say that DdlRel is RelNode. In my opinion, it would be better if they have the same parent class. But this probably requires changes in calcite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbarua I introduced a new class CalciteOperation to represent relational algebra and ddl operations. Please take a look when you get a chance. If the suggested approach makes sense I will update SubstraitToCalcite accordingly.

@ZorinAnton ZorinAnton force-pushed the zor-sql-to-ddl2 branch 6 times, most recently from c3139c4 to f8041dd Compare August 22, 2025 12:24
@ZorinAnton ZorinAnton requested a review from vbarua August 22, 2025 12:53
# Conflicts:
#	isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java

# Conflicts:
#	isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java

# Conflicts:
#	isthmus/src/main/java/io/substrait/isthmus/SqlToSubstrait.java
#	isthmus/src/test/java/io/substrait/isthmus/DdlRoundtripTest.java
@ZorinAnton
Copy link
Contributor Author

so we should restrict our processing to only handle what we've chosen to support explicitly for now.

Makes sense. I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration. If this approach is acceptable, we can move the validation logic for non-DDL and non-DML relations there as well (probably in a separate PR).

@vbarua
Copy link
Member

vbarua commented Sep 17, 2025

@ZorinAnton could you avoid rebasing your PR every time you update it. It makes it harder to review because I can't just look at the new commits, and it messes with Github PR view a bit. The end result will be squash merged as well, so we don't need to keep the branch fresh with main.

@ZorinAnton
Copy link
Contributor Author

sure, will do, I just didn't want to have a separate commit per every singe change.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just didn't want to have a separate commit per every singe change.

Because it gets squashed at the end this won't matter. When I merge as well, I summarize the individual commits into a coherent message.

I've made some suggestions for this a separate PR targetting your PR
ZorinAnton#1

I've added a validator to SubstraitRelNodeConverter to ensure incoming relations have a supported configuration.

I don't think we need that level of machinary for this work. For one thing, in the long-term we shouldn't need those validations at all because we should be able to convert ALL valid Substrait messages to Calcite and back. In my suggestions, I've inlined very basic restrictions.

I'm not opposed to validations in general, but I don't want to make them part of this PR because if we start fiddling with a public API for validations it will extend how long this PR remains open.

@ZorinAnton
Copy link
Contributor Author

I've made some suggestions for this a separate PR targetting your PR

Thanks, merged it.

in the long-term we shouldn't need those validations at all because we should be able to convert ALL valid Substrait messages to Calcite and back.

before it happens, a dedicated validator would give a quick view of what is supported and what's not, but I agree that it needs to be addressed in a separate PR.

@ZorinAnton ZorinAnton requested a review from vbarua September 18, 2025 09:09
@vbarua vbarua merged commit 260a1c4 into substrait-io:main Sep 19, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants