Skip to content

Scala build infra #1211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Scala build infra #1211

wants to merge 1 commit into from

Conversation

snazy
Copy link
Member

@snazy snazy commented Mar 19, 2025

Adds build-infra related work for Scala projects, like #1208

Adds build-infra related work for Scala projects, like apache#1208
Copy link
Contributor

@sfc-gh-ygu sfc-gh-ygu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing Scala into a Java codebase can increase complexity and reduce maintainability. It requires the team to manage interoperability concerns, handle differences in toolchains, and maintain expertise in two languages simultaneously, potentially causing confusion, increased onboarding time, and decreased productivity.

Do we really want to introduce Scala code in Polaris?

cc @RussellSpitzer @jbonofre @collado-mike @dennishuo @eric-maynard

@eric-maynard
Copy link
Contributor

It seems like the intent in #1190 is to use scala too. Is that right?

@snazy
Copy link
Member Author

snazy commented Mar 20, 2025

So, let me summarize:
a) the project wants a Spark plugin, and Spark relies on Scala
b) the project wants the benchmarks, and those run on Gatling, ẃhich relies on Scala

I'm not a Scala fan, but for Spark + Gatling there's just no way around that.

@jbonofre
Copy link
Member

For benchmark, it depends where it will land. If it goes on polaris-tools, the scala infra should be there.

For Spark plugin, it's a good point to have Scala infra there.

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

@eric-maynard The intent in #1190 is not using Scala, the scala version in the PR is used to pull the right spark and iceberg dependency, for example: spark_sql_2.12 and iceberg_spark_runtime_3.5-2.21 etc. I do not intend to introduce any Scala class in for the plugin also.

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

@snazy @jbonofre Could you explain some more details to me about the claim that "spark plugin requires scala build infra"? My understanding is that scala build is only required if we introduce our own Scala class, but we can use java for the plugin code like SparkCatalog in iceberg. For dependencies like spark_sql, iceberg_spark_runtime, as far as we pull the correct library with scala version, no re-compile will be needed, right?

@snazy
Copy link
Member Author

snazy commented Mar 20, 2025

@gh-yzou you mentioned the reason yourself in "spark_sql_2.12 and iceberg_spark_runtime_3.5-2.12" - "2.12" (and 2.13) are Scala versions, right?

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

@snazy but those are libraries already compiled, i do not need to recompile those libraries anymore, right? I am not sure if
i get how scala build infra will be used if we are not building new Scala classes, could you give more insights on this?

@snazy
Copy link
Member Author

snazy commented Mar 20, 2025

You need to have the right dependencies in place - and that is what this change + #1203 do

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

change #1203 assumes there will be Scala class there, like ScalaStuff.scala, therefore it requires the scala build. However, there is no strict requirement that we need to develop the spark plugin using Scala code. If we do not introduce any new Scala code, i am not sure I am getting the reason why we need Scala build infra?

@snazy
Copy link
Member Author

snazy commented Mar 20, 2025

If you don't have Scala code, then it won't change anything, except give you the right dependencies. But #1208 does have Scala code.

@eric-maynard
Copy link
Contributor

I do like Scala, but I do not think we should make the decision of introducing a new language [dependency] lightly. Maybe it's better to discuss this on #1208, but if it's possible to avoid a new language [dependency] at this time I think we should avoid it.

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

Right, I am just trying to clarify that Scala build is not strictly required by Spark plugin. And as @jbonofre pointed out, #1208 might fits better in polaris-tools repository. As @eric-maynard suggested, we probably could move the discussion to PR #1208, and it would be great if we could avoid introducing new language in Polaris if not necessary.

@snazy
Copy link
Member Author

snazy commented Mar 20, 2025

Well, the Scala language implications are implicitly introduced with the Spark plugin.
I don't mind removing the Scala compilation here, but the dependency stuff needs to happen.

@gh-yzou
Copy link
Contributor

gh-yzou commented Mar 20, 2025

By dependency stuff, do you mean we need to use the correct implementation(org.apache.spark:spark-sql_:) library in our plugin development? If that is what you mean, i think we are doing that, we configs the spark scala version in a spark-scala.properties file like what nessie does, and adds all targets as a default build.
I think my point is that it is only used to config the dependency library version, and the plugin do not need Scala build if it is not introducing any new Scala classes. So the scala build infra is not needed by Spark Plugin at least for now, if it turns out we really need to introduce some dependencies on the Scala its-self, we can always add the build later.
For PR 1208, if we decided to go with Scala, it will be required there, but there seems a discussion about where the change should go.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants