Skip to content

Conversation

@silver-ymz
Copy link
Contributor

@silver-ymz silver-ymz commented Dec 10, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@silver-ymz silver-ymz marked this pull request as ready for review December 10, 2025 05:41
Copilot AI review requested due to automatic review settings December 10, 2025 05:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for DeleteScan operations (both equality and position deletes) to the DataFusion engine for Iceberg tables. Previously, only DataScan was supported. The changes enable proper handling of delete files in Iceberg's copy-on-write operations.

Key changes:

  • Converted IcebergScan::new() to an async function to support pre-loading file scan tasks during initialization
  • Added list_iceberg_scan_task() method to enumerate and filter data/delete files based on scan type
  • Implemented metadata column appending (sequence number, file path, file position) for tracking row lineage
  • Changed partition handling from single partition to multi-partition based on file scan tasks
  • Removed hidden column filtering to expose Iceberg metadata columns through the DataFusion schema

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/frontend/src/datafusion/iceberg_table_provider.rs Made scan() async and removed hidden column filter to expose Iceberg metadata columns
src/frontend/src/datafusion/iceberg_executor.rs Added delete scan support with task enumeration, multi-partition execution, and metadata column handling
e2e_test/iceberg/test_case/pure_slt/iceberg_datafusion_engine.slt Added test cases for delete operations with DataFusion engine
Comments suppressed due to low confidence (1)

src/frontend/src/datafusion/iceberg_executor.rs:64

  • The #[allow(dead_code)] attributes on need_seq_num and need_file_path_and_pos are no longer needed as these fields are now actively used in the execute_inner method (lines 265-266). These attributes should be removed.
    need_seq_num: bool,
    #[allow(dead_code)]
    need_file_path_and_pos: bool,

Signed-off-by: Mingzhuo Yin <[email protected]>
plan_properties,
};
inner.tasks = inner.list_iceberg_scan_task().try_collect().await?;
inner.plan_properties.partitioning = Partitioning::UnknownPartitioning(inner.tasks.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a session variable called batch_parallelism, I think we can use this value to control the parallelism instead of using the task number directly, because the task number could be very large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in new commit

Copy link
Contributor Author

silver-ymz commented Dec 12, 2025

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@silver-ymz silver-ymz added this pull request to the merge queue Dec 12, 2025
Merged via the queue into main with commit 389c509 Dec 12, 2025
36 checks passed
@silver-ymz silver-ymz deleted the feat/datafusion-delete-scan branch December 12, 2025 09:08
EdwinaZhu pushed a commit to EdwinaZhu/risingwave that referenced this pull request Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants