Add custom MultiFileReader for reading delete files #674

redox · 2026-01-07T18:55:00Z

This follows #641 that we closed to work-around a GH issue.

We noticed that read performance can degrade when a table accumulates a large number of delete files, especially when those files live on remote object storage.

TLDR; We can actually speed things up by reusing metadata that DuckLake already has, avoiding a bunch of unnecessary storage HEAD requests when loading delete files.

For data files, this metadata is passed down to DuckDB’s MultiFileReader and ultimately to the filesystem layer. This matters because the Parquet reader needs the file size to locate footer metadata efficiently. However, for delete files, this optimization wasn’t applied yet. When using an HTTPFileSystem (e.g. S3 or Azure), missing metadata means DuckDB issues a HEAD request per file to fetch it. Since DuckLake files are immutable, those extra requests are pure overhead—and they really add up when a table has many delete files.

This PR introduces a custom multi-file reading path for delete files that eagerly injects the metadata we already have, eliminating redundant storage calls.

What Changed

New DeleteFileFunctionInfo struct
Extends TableFunctionInfo to carry DuckLakeFileData metadata through the table function binding phase.
Custom DeleteFileMultiFileReader
A specialized MultiFileReader that:
- Intercepts file list creation
- Pre-populates ExtendedOpenFileInfo with known metadata:
  - File size (file_size_bytes)
  - ETag (empty placeholder)
  - Last modified timestamp (epoch)
  - Encryption key (when present)
- Builds a SimpleMultiFileList upfront
- Overrides CreateFileList() to return this pre-built list, skipping DuckDB’s default discovery logic entirely
Updates to ScanDeleteFile()
- Switched parquet_scan from a const reference to a mutable copy so it can be customized
- Attaches DeleteFileFunctionInfo and the custom reader factory to the table function
- Passes the real parquet_scan function into TableFunctionBindInput (instead of a dummy), ensuring the correct execution context

The win is most noticeable on remote storage backends (S3, Azure, etc.) and for tables with many delete files, where these requests were a significant source of latency.

Net result: faster reads, fewer network round-trips, and better scalability ✨

pdet

Thanks, a couple more comments!

pdet · 2026-01-09T14:05:09Z

test/sql/delete/delete_metadata.test

+require parquet
+
+statement ok
+ATTACH 'ducklake:${DUCKLAKE_CONNECTION}' AS ducklake (DATA_PATH 's3://mybucket')


Other small nit

test-env DATA_PATH __TEST_DIR__ statement ok ATTACH 'ducklake:${DUCKLAKE_CONNECTION}' AS ducklake (DATA_PATH '${DATA_PATH}/delete_metadata}')

This test will still only be executed with minio due to the require-env but could eventually be executed in dfferent scenarios in the future

Yes actually we cannot do this because of duckdb/duckdb#20396 - once the 1.4 branch updates its dependency to a more recent duckdb 1.4 tag - we'll be able to do this.

pdet · 2026-01-09T14:06:02Z

test/sql/delete/delete_metadata.test

+query II
+EXPLAIN ANALYZE SELECT COUNT(*) FILTER(WHERE id%2=0) FROM ducklake.test
+----
+analyzed_plan	<REGEX>:.*#HEAD: 0.*
+
+# we can time travel to see the state of the table before deletes
+query II
+EXPLAIN ANALYZE SELECT COUNT(*) FILTER(WHERE id%2=0) FROM ducklake.test AT (VERSION => 2)
+----
+analyzed_plan	<REGEX>:.*#HEAD: 0.*


I've actually got this branch locally and did a build without your changes in src/storage/ducklake_delete_filter.cpp and the tests seems to have the same effect? Maybe I'm missing something?

Hmm 🤔 I'll check this again on behalf of Mathieu who initially opened this MR. Thank you for this @pdet, I should have double-checked as well 🫡

@pdet Because of our local development setups we had a lot of trouble reproducing failure / success with and without the fix. That being said, it looks like the base commit we were using might have caused the flakiness you observed.

I went and created a Dockerfile (based on the Minio.yml workflow config to be as close as possible to the CI):

# Based on the MinIO.yml workflow configuration FROM ubuntu:latest # Prevent interactive prompts during package installation ENV DEBIAN_FRONTEND=noninteractive # Install required Ubuntu packages RUN apt-get update -y -qq && \ apt-get install -y -qq \ software-properties-common \ curl \ git \ zip \ unzip \ tar \ pkg-config && \ add-apt-repository ppa:git-core/ppa && \ apt-get update -y -qq && \ apt-get install -y -qq \ build-essential \ cmake \ ninja-build \ ccache \ python3 \ clang \ llvm && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* # Set working directory WORKDIR /workspace # Copy the project COPY . . # Setup vcpkg RUN git clone https://github.com/microsoft/vcpkg.git /workspace/vcpkg && \ cd /workspace/vcpkg && \ git checkout ce613c41372b23b1f51333815feb3edd87ef8a8b && \ ./bootstrap-vcpkg.sh # Set build environment variables ENV VCPKG_TARGET_TRIPLET=x64-linux-release ENV GEN=ninja ENV VCPKG_FEATURE_FLAGS=-binarycaching ENV VCPKG_TOOLCHAIN_PATH=/workspace/vcpkg/scripts/buildsystems/vcpkg.cmake ENV BUILD_EXTENSION_TEST_DEPS=full ENV S3_TEST_SERVER_AVAILABLE=1 ENV CC=clang ENV CXX=clang++ # Show versions for verification RUN ninja --version && \ cmake --version && \ clang --version # Build the project RUN make release # Default command CMD ["/bin/bash"]

Building with docker build -t ducklake-build ., and running the test with docker run --network host -it ducklake-build bash -c "S3_TEST_SERVER_AVAILABLE=1 ./build/release/test/unittest --test-config test/configs/minio.json test/sql/delete/delete_metadata.test" (having a minio instance running on localhost:9000) shows that:

Without the ducklake_delete_filter.cpp fix of this PR, the test fails

With the fix, the test succeeds.

pdet · 2026-01-09T14:06:45Z

src/storage/ducklake_delete_filter.cpp

+		extended_info->options["etag"] = Value("");
+		extended_info->options["last_modified"] = Value::TIMESTAMP(timestamp_t(0));


@samansmink since you are more aware than I am of the effects of S3 files in the multifile reader, can this have any ill effects?

redox · 2026-01-13T16:37:45Z

Closing this for now; the bug is somewhere between https://github.com/duckdb/ducklake and https://github.com/dentiny/duck-read-cache-fs - we need to figure out how to test this from here if we want to fix it here.

…e files **Context**: We experience slow read performance when a table has many delete files. **TL;DR**: We can leverage the metadata already available in DuckLake to improve load time of delete files. **Problem & Motivation:** DuckLake stores `file_size` metadata for both data and delete files. For data files, there is already a mechanism to forward this metadata to the MultiFileReader and the underlying filesystem. The Parquet reader requires this `file_size` to access the footer metadata. When using an `HTTPFileSystem` instance (e.g., for S3, Azure), it performs a HEAD request on the file if metadata fields (`file_size`, `etag`, `last_modified`) are not present. Since all files in DuckLake are immutable, we can apply the same optimization logic for delete files to avoid these unnecessary HEAD requests. **Solution:** Implements a custom multi-file reading solution that pre-populates file metadata to eliminate redundant storage HEAD requests when scanning delete files: **Key Changes:** 1. **New `DeleteFileFunctionInfo` struct**: Extends `TableFunctionInfo` to carry `DuckLakeFileData` metadata through the table function binding process. 2. **Custom `DeleteFileMultiFileReader` class**: - Extends DuckDB's `MultiFileReader` to intercept file list creation - Pre-populates `ExtendedOpenFileInfo` with metadata already available from DuckLake: - File size (`file_size_bytes`) - ETag (empty string as placeholder) - Last modified timestamp (set to epoch) - Encryption key (if present) - Creates a `SimpleMultiFileList` with this extended info upfront - Overrides `CreateFileList()` to return the pre-built list, bypassing DuckDB's default file discovery 3. **Modified `ScanDeleteFile()` method**: - Changed `parquet_scan` from const reference to mutable copy to allow modification - Attaches `DeleteFileFunctionInfo` and custom reader factory to the table function - Passes the actual `parquet_scan` function to `TableFunctionBindInput` instead of a dummy function, ensuring proper function context **Performance Impact**: Eliminates HEAD requests to object storage when opening Parquet delete files. This is particularly beneficial when working with remote storage (S3, Azure, etc.) and tables with many delete files, where HEAD requests were causing significant performance bottlenecks.

pdet · 2026-01-14T14:56:27Z

Aha, we actually have a very similar PR to iceberg, code-wise looks very similar:
https://github.com/duckdb/duckdb-iceberg/pull/579/files

Maybe the test we have there would be a better fit?

redox mentioned this pull request Jan 7, 2026

Add custom MultiFileReader for reading delete files #641

Closed

pdet reviewed Jan 9, 2026

View reviewed changes

redox marked this pull request as draft January 9, 2026 16:00

redox closed this Jan 13, 2026

redox reopened this Jan 14, 2026

mchataigner and others added 3 commits January 14, 2026 12:56

Ensure this test only runs on MinIO

94e516e

Removed useless autoinstall,autoload

42a586b

leo-altertable force-pushed the mbc/improve_scan_delete_files branch from 5ef6bfb to 42a586b Compare January 14, 2026 12:03

redox marked this pull request as ready for review January 14, 2026 12:04

Update test

8bc272b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add custom MultiFileReader for reading delete files #674

Add custom MultiFileReader for reading delete files #674

Uh oh!

redox commented Jan 7, 2026

Uh oh!

pdet left a comment

Uh oh!

pdet Jan 9, 2026

Uh oh!

redox Jan 14, 2026

Uh oh!

pdet Jan 9, 2026

Uh oh!

redox Jan 9, 2026

Uh oh!

leo-altertable Jan 14, 2026 •

edited

Loading

Uh oh!

pdet Jan 9, 2026

Uh oh!

redox commented Jan 13, 2026 •

edited

Loading

Uh oh!

pdet commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		extended_info->options["etag"] = Value("");
		extended_info->options["last_modified"] = Value::TIMESTAMP(timestamp_t(0));

Add custom MultiFileReader for reading delete files #674

Are you sure you want to change the base?

Add custom MultiFileReader for reading delete files #674

Uh oh!

Conversation

redox commented Jan 7, 2026

What Changed

Uh oh!

pdet left a comment

Choose a reason for hiding this comment

Uh oh!

pdet Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

redox Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

pdet Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

redox Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

leo-altertable Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pdet Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

redox commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdet commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leo-altertable Jan 14, 2026 •

edited

Loading

redox commented Jan 13, 2026 •

edited

Loading