Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added fallback using reflection for backward-compatibility #1573

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

YanivKunda
Copy link
Contributor

Which issue does this PR close?

Closes #1572

Rationale for this change

Users with older 3.5.x versions before Spark 3.5.5 should be able using the latest comet version.

What changes are included in this PR?

Modifying the 3.5 ShimCometScanExec shim to fallback to reflection with the previous method signatures if the latest is not found.

How are these changes tested?

Ran mvn test with the spark-3.5 profile.
Added 3.5.4 to the matrix of tested spark versions in spark_sql_test.yml

@@ -45,7 +45,7 @@ jobs:
matrix:
os: [ubuntu-24.04]
java-version: [11]
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.5'}]
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.4'}, {short: '3.5', full: '3.5.5'}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we advanced the minimum version to 3.5.5, I am not sure if we should add back 3.5.4 tests.
SQL tests can be done here, but tests in Comet are not done with 3.5.4 anymore.

If we decide to support 3.5.4, I would vote for doing it by adding a maven profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a maven profile using spark 3.5.4 will break compilation for the 3.5 shim -
So to support both 3.5.4 and 3.5.5 in compilation we will need either separate shims for each patch version,
or the 3.5 shim to support all versions through reflection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should create a separate Shims for Spark 3.5.4 where it is necessary. I know it is right now we only do profile at 3.4, 3.5, 4.0 level. but I observed a couple of API changes between patch version changes.

Comet Release for 3.5 does not mean that it works with all 3.5.x, currently. Now the minimum version will be 3.5.5 in the next release. Either we drop the 3.5.4 support, or create a new profile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for the reflection approach after my post-merge discussion with @wForget in #1565 -
since creating different shims for 3.5.4 also means a different release for 3.5.4, which might cause confusion for users.
Since I did the upgrade to 3.5.5 separately, dropping support for 3.5.4 simply means discarding this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add 3.5.4 ci, just make sure it can be compiled successfully. Or how about deleting it after verification in this pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can't be compiled succesfully - for the same reason 3.5.5 couldn't be compiled when the 3.5 shims were compiled against 3.5.4...

try {
PartitionedFileUtil.getPartitionedFile(f, f.getPath, p.values)
} catch {
case _: NoSuchMethodError =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about directly judging the spark version? Like:

if (org.apache.spark.SPARK_VERSION_SHORT >= "3.5.5") {
  ... for spark 3.5.5+
} else {
  ... for spark 3.5.4
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And keeping the current code paths? (i.e. compiled code for 3.5.5, else reflection)
That is more explicit, but I'm not sure it brings any benefit.
The advantage of the try-catch idea that in case this param will go away in 3.5.6 (like it did in 4.0.0-preview1/2, while it's back in main...) this will remain backward and forward-compatible.

Copy link
Member

@wForget wForget Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflection is ok with me. I just wanted to make sure we dropped 3.5.4 support
If we decide to support 3.5.4 as community, I think we should test with 3.5.4 (downgrade in pom) all the way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer maintaining compatibility with more Spark patch versions to increase its adoptability. If it is known to not work with Spark <=3.5.4, we'd better explicitly document it in Supported Spark Versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should support multiple 3.5.x versions. Some users do not have control over the minor Spark version they use. There isn't any technical reason why we cannot maintain diff files for multiple Spark versions and test multiple versions in CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see reflection used for both 3.5.0-3.5.4 and 3.5.5 rather than catching a runtime exception, as @wForget suggests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit to always invoke the two methods differing between 3.5.x versions using reflection.

@codecov-commenter
Copy link

codecov-commenter commented Mar 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.49%. Comparing base (f09f8af) to head (58848ab).
Report is 105 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1573      +/-   ##
============================================
+ Coverage     56.12%   58.49%   +2.36%     
- Complexity      976      977       +1     
============================================
  Files           119      122       +3     
  Lines         11743    12231     +488     
  Branches       2251     2278      +27     
============================================
+ Hits           6591     7154     +563     
+ Misses         4012     3945      -67     
+ Partials       1140     1132       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NoSuchMethodError: PartitionedFileUtil$.splitFiles when running with Spark 3.5.4
6 participants