Skip to content

Comments

feat(spark): build and publish multi-variant targets#707

Draft
andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
andrew-coleman:spark4
Draft

feat(spark): build and publish multi-variant targets#707
andrew-coleman wants to merge 3 commits intosubstrait-io:mainfrom
andrew-coleman:spark4

Conversation

@andrew-coleman
Copy link
Member

@andrew-coleman andrew-coleman commented Feb 20, 2026

This PR adds the framework to build and publish multiple Substrait Spark packages targeted at different versions of Scala and Spark.

This is a large PR, however, no functional changes are made. It is split into three commits:

  1. Upgrading the Spark dependency to version 4.0.2.
    • There were a few breaking API changes that had to be resolved.
    • Two new functions lpad and rpad had to be added because the Spark 4.0.2 query planner/optimizer inserted them for many of the TPCDS queries. The dialect.yaml was regenerated accordingly.
  2. Implement multi-variant build and publish.
    • Supporting the version matrix:
      • Scala 2.12 and 2.13.
      • Spark 3.4, 3.5 and 4.0.
    • Added build targets:
      • Spark 3.4 with Scala 2.12
      • Spark 3.5 with Scala 2.12
      • Spark 4.0 with Scala 2.13
    • Extra targets can be added later. A readme file has been added describing how.
  3. Implemented the code for each of these target versions.
    • Since there were breaking API changes between v3.x and v4.x, this commit refactors the code to abstract affected API calls into a compatibility interface which was implemented by each version.
    • The dialect generator was updated to order the functions alphabetically by name. This was necessary because previously it was ordered by how scala iterated over the contents of a map, and the two versions of scala did that differently. The newly generated dialect file in this commit is simply a re-ordered version of the previous one.
    • The gradle script was update to support the publication of the three targets:
      • spark34_2.12, spark35_2.12 and spark40_2.13

@andrew-coleman andrew-coleman marked this pull request as draft February 20, 2026 14:49
WIP

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Adds support for building/publishing multiple targets for
different versions of scala (2.12, 2.13) and spark (3.4, 3.5, 4.0).
Other combinations can be added in the future.

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
@andrew-coleman andrew-coleman force-pushed the spark4 branch 4 times, most recently from 6330a71 to 9216c45 Compare February 20, 2026 16:06
There are a number of Spark API breaking changes between
3.x and 4.x.
This commit refactors the code to abstract affected API calls
into a compatibility interface which can be implemented by each version.

Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant