-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-49828][SQL] Expose ColumnNode to/from Expression utils as developer APIs #48306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| private[sql] object ColumnNodeToExpressionConverter extends ColumnNodeToExpressionConverter { | ||
| @DeveloperApi | ||
| object ColumnNodeToExpressionConverter extends ColumnNodeToExpressionConverter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question, we have an implicit class in SparkSession called RichColumn. That will 'restore' the expr functionality. Is that something that would work for you? Alternatively, if the use of SparkSession is cumbersome, we could also add another implicit class to org.apache.spark.sql.classic.ClassicConversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for Java users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I see your point. I would love to meet the Java developer that works with Spark internals :)...
They can still use these objects.
| */ | ||
| private[sql] case class ExpressionColumnNode private( | ||
| @DeveloperApi | ||
| case class ExpressionColumnNode private( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use org.apache.spark.sql.classic.ClassicConversions for this. I am not saying that you should, some people don't like implicits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't do both the directions of the conversion and, as you point out, it's also an implicit conversion which (in my opinion) is some pretty unnecessary magic for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case implicits provide a way for a developer to make a minimal amount of changes, that is why they are there. I am generally not a big fan, but in this case they do make migration a lot easier.
You could call org.apache.spark.sql.classic.ClassicConversions.column directly if you want to avoid implicits.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little weird to have DeveloperApi in org.apache.spark.sql.internal package.
For example, Apache Spark codebase has no DeveloperApi in internal package so far.
$ git grep DeveloperApi | grep internal | wc -l
0
If we need to open this, shall we move the package location from internal to a more proper package location, @holdenk ?
Seems reasonable to me. |
…perApis instead of internal so that external advanced developers can write columns with codegen that take in columns for input.
f6e7c5a to
54e990d
Compare
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Make ColumnNodeToExpressionConverter ExpressionColumnNode into DeveloperApis instead of internal so that external advanced developers can write columns with codegen that take in columns for input.
Why are the changes needed?
After the introduction of the ColumNode interface external advanced developers building codegen expressions wishing to provide a Scala API have to hack around and access internal Spark APIs. This is illustrated in my PR upgrading spark-testing-base's codegen example to Spark 4 preview2 -- see holdenk/spark-testing-base#423
Does this PR introduce any user-facing change?
No -- developer only
How was this patch tested?
Annotation/visibility only change, verified (manually) that these APIs would be sufficient in holdenk/spark-testing-base#423
Was this patch authored or co-authored using generative AI tooling?
No