-
Notifications
You must be signed in to change notification settings - Fork 715
feat: improve type coercion and validation for datafusion execution #24120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: graphite-base/24120
Are you sure you want to change the base?
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the DataFusion integration in RisingWave by refactoring the type system interface and improving type conversion handling between RisingWave and DataFusion. The changes enable better interoperability for complex operations like joins, binary expressions, and cast operations while ensuring type safety.
Key Changes:
- Refactored
ColumnTraitto provide both RisingWave and DataFusion type information, enabling proper type conversions during expression evaluation - Added pre-evaluation type safety checks for binary operations using DataFusion's
BinaryTypeCoercerto validate type compatibility before conversion - Introduced
CastExecutorto handle type conversions more systematically when falling back to RisingWave expressions
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/frontend/src/optimizer/plan_visitor/datafusion_plan_converter.rs | Updated to use refactored ColumnTrait interface that tracks both DF and RW schemas; introduced ConcatColumns for join operations |
| src/frontend/src/datafusion/function.rs | Added type safety checks for binary operations; refactored cast handling; introduced is_datafusion_native() extension method; updated fallback expression builder to use CastExecutor |
| src/frontend/src/datafusion/execute.rs | Refactored CastExecutor into a public struct with new() and from_iter() constructors; improved chunk ownership handling; refined timeout logic |
| src/frontend/src/datafusion/convert.rs | Enhanced ColumnTrait to include type information methods; implemented ConcatColumns struct for join column handling |
| e2e_test/iceberg/test_case/pure_slt/iceberg_datafusion_engine.slt | Enabled previously commented test case for date arithmetic with intervals |
| // TODO: some optimizing rules will cause inconsistency, need to investigate later | ||
| // Currently we disable all optimizing rules to ensure correctness | ||
| let df_plan = state.analyzer().execute_and_check( | ||
| plan.plan.as_ref().clone(), | ||
| &ConfigOptions::default(), | ||
| |_, _| {}, | ||
| )?; |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "Currently we disable all optimizing rules" but the code calls state.analyzer().execute_and_check() which may still run analysis passes. If the intent is to completely disable optimization, consider clarifying what this analyzer call does and whether it's truly disabling all optimization rules, or update the comment to accurately reflect what analysis/optimization is still being performed.

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR tightens type handling and improves correctness when executing scalar functions and expressions via DataFusion.
Key changes
Insert explicit type casts before evaluating scalar functions
Ensures inputs conform to expected types and avoids relying on implicit or incomplete coercion.
Strengthen validation in convert_function_call
RegexMatch, as DataFusion currently lacks full feature support for it.BinaryTypeCoercerto explicitly validate whether a given binary operation is supported by DataFusion.Enable DataFusion’s analyzer in
execute_datafusion_planThis allows DataFusion to apply its own type coercion and analysis logic, improving compatibility and reducing manual edge-case handling.
Checklist
Documentation