Skip to content

Conversation

@silver-ymz
Copy link
Contributor

@silver-ymz silver-ymz commented Dec 15, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR tightens type handling and improves correctness when executing scalar functions and expressions via DataFusion.

Key changes

  1. Insert explicit type casts before evaluating scalar functions
    Ensures inputs conform to expected types and avoids relying on implicit or incomplete coercion.

  2. Strengthen validation in convert_function_call

    • Enforce stricter checks on DataFusion native types to prevent unsupported or ambiguous conversions.
    • Remove RegexMatch, as DataFusion currently lacks full feature support for it.
    • Use BinaryTypeCoercer to explicitly validate whether a given binary operation is supported by DataFusion.
  3. Enable DataFusion’s analyzer in execute_datafusion_plan
    This allows DataFusion to apply its own type coercion and analysis logic, improving compatibility and reducing manual edge-case handling.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@silver-ymz silver-ymz marked this pull request as ready for review December 15, 2025 11:35
Copilot AI review requested due to automatic review settings December 15, 2025 11:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the DataFusion integration in RisingWave by refactoring the type system interface and improving type conversion handling between RisingWave and DataFusion. The changes enable better interoperability for complex operations like joins, binary expressions, and cast operations while ensuring type safety.

Key Changes:

  • Refactored ColumnTrait to provide both RisingWave and DataFusion type information, enabling proper type conversions during expression evaluation
  • Added pre-evaluation type safety checks for binary operations using DataFusion's BinaryTypeCoercer to validate type compatibility before conversion
  • Introduced CastExecutor to handle type conversions more systematically when falling back to RisingWave expressions

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/frontend/src/optimizer/plan_visitor/datafusion_plan_converter.rs Updated to use refactored ColumnTrait interface that tracks both DF and RW schemas; introduced ConcatColumns for join operations
src/frontend/src/datafusion/function.rs Added type safety checks for binary operations; refactored cast handling; introduced is_datafusion_native() extension method; updated fallback expression builder to use CastExecutor
src/frontend/src/datafusion/execute.rs Refactored CastExecutor into a public struct with new() and from_iter() constructors; improved chunk ownership handling; refined timeout logic
src/frontend/src/datafusion/convert.rs Enhanced ColumnTrait to include type information methods; implemented ConcatColumns struct for join column handling
e2e_test/iceberg/test_case/pure_slt/iceberg_datafusion_engine.slt Enabled previously commented test case for date arithmetic with intervals

Comment on lines +62 to +68
// TODO: some optimizing rules will cause inconsistency, need to investigate later
// Currently we disable all optimizing rules to ensure correctness
let df_plan = state.analyzer().execute_and_check(
plan.plan.as_ref().clone(),
&ConfigOptions::default(),
|_, _| {},
)?;
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Currently we disable all optimizing rules" but the code calls state.analyzer().execute_and_check() which may still run analysis passes. If the intent is to completely disable optimization, consider clarifying what this analyzer call does and whether it's truly disabling all optimization rules, or update the comment to accurately reflect what analysis/optimization is still being performed.

Copilot uses AI. Check for mistakes.
@silver-ymz silver-ymz changed the title feat: add expr cast before evaling scalar funciton in datafusion feat: improve type coercion and validation for datafusion execution Dec 15, 2025
@silver-ymz silver-ymz changed the base branch from refactor/datafusion-error-handling to graphite-base/24120 December 16, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants