Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Sep 30, 2024

What changes were proposed in this pull request?

Make ColumnNodeToExpressionConverter ExpressionColumnNode into DeveloperApis instead of internal so that external advanced developers can write columns with codegen that take in columns for input.

Why are the changes needed?

After the introduction of the ColumNode interface external advanced developers building codegen expressions wishing to provide a Scala API have to hack around and access internal Spark APIs. This is illustrated in my PR upgrading spark-testing-base's codegen example to Spark 4 preview2 -- see holdenk/spark-testing-base#423

Does this PR introduce any user-facing change?

No -- developer only

How was this patch tested?

Annotation/visibility only change, verified (manually) that these APIs would be sufficient in holdenk/spark-testing-base#423

Was this patch authored or co-authored using generative AI tooling?

No

@holdenk holdenk requested a review from hvanhovell September 30, 2024 21:40
@github-actions github-actions bot added the SQL label Sep 30, 2024

private[sql] object ColumnNodeToExpressionConverter extends ColumnNodeToExpressionConverter {
@DeveloperApi
object ColumnNodeToExpressionConverter extends ColumnNodeToExpressionConverter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, we have an implicit class in SparkSession called RichColumn. That will 'restore' the expr functionality. Is that something that would work for you? Alternatively, if the use of SparkSession is cumbersome, we could also add another implicit class to org.apache.spark.sql.classic.ClassicConversions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for Java users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I see your point. I would love to meet the Java developer that works with Spark internals :)...

They can still use these objects.

*/
private[sql] case class ExpressionColumnNode private(
@DeveloperApi
case class ExpressionColumnNode private(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use org.apache.spark.sql.classic.ClassicConversions for this. I am not saying that you should, some people don't like implicits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't do both the directions of the conversion and, as you point out, it's also an implicit conversion which (in my opinion) is some pretty unnecessary magic for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case implicits provide a way for a developer to make a minimal amount of changes, that is why they are there. I am generally not a big fan, but in this case they do make migration a lot easier.

You could call org.apache.spark.sql.classic.ClassicConversions.column directly if you want to avoid implicits.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little weird to have DeveloperApi in org.apache.spark.sql.internal package.

For example, Apache Spark codebase has no DeveloperApi in internal package so far.

$ git grep DeveloperApi | grep internal | wc -l
       0

If we need to open this, shall we move the package location from internal to a more proper package location, @holdenk ?

@holdenk
Copy link
Contributor Author

holdenk commented Dec 5, 2024

If we need to open this, shall we move the package location from internal to a more proper package location, @holdenk ?

Seems reasonable to me.

@holdenk holdenk force-pushed the SPARK-49828-column-expression-apis branch from f6e7c5a to 54e990d Compare December 5, 2024 22:59
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 16, 2025
@github-actions github-actions bot closed this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants