Skip to content

Conversation

@redox
Copy link
Contributor

@redox redox commented Jan 7, 2026

This follows #641 that we closed to work-around a GH issue.

We noticed that read performance can degrade when a table accumulates a large number of delete files, especially when those files live on remote object storage.

TLDR; We can actually speed things up by reusing metadata that DuckLake already has, avoiding a bunch of unnecessary storage HEAD requests when loading delete files.

For data files, this metadata is passed down to DuckDB’s MultiFileReader and ultimately to the filesystem layer. This matters because the Parquet reader needs the file size to locate footer metadata efficiently. However, for delete files, this optimization wasn’t applied yet. When using an HTTPFileSystem (e.g. S3 or Azure), missing metadata means DuckDB issues a HEAD request per file to fetch it. Since DuckLake files are immutable, those extra requests are pure overhead—and they really add up when a table has many delete files.

This PR introduces a custom multi-file reading path for delete files that eagerly injects the metadata we already have, eliminating redundant storage calls.

What Changed

  1. New DeleteFileFunctionInfo struct
    Extends TableFunctionInfo to carry DuckLakeFileData metadata through the table function binding phase.

  2. Custom DeleteFileMultiFileReader
    A specialized MultiFileReader that:

    • Intercepts file list creation
    • Pre-populates ExtendedOpenFileInfo with known metadata:
      • File size (file_size_bytes)
      • ETag (empty placeholder)
      • Last modified timestamp (epoch)
      • Encryption key (when present)
    • Builds a SimpleMultiFileList upfront
    • Overrides CreateFileList() to return this pre-built list, skipping DuckDB’s default discovery logic entirely
  3. Updates to ScanDeleteFile()

    • Switched parquet_scan from a const reference to a mutable copy so it can be customized
    • Attaches DeleteFileFunctionInfo and the custom reader factory to the table function
    • Passes the real parquet_scan function into TableFunctionBindInput (instead of a dummy), ensuring the correct execution context

The win is most noticeable on remote storage backends (S3, Azure, etc.) and for tables with many delete files, where these requests were a significant source of latency.

Net result: faster reads, fewer network round-trips, and better scalability

Copy link
Collaborator

@pdet pdet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a couple more comments!

require parquet

statement ok
ATTACH 'ducklake:${DUCKLAKE_CONNECTION}' AS ducklake (DATA_PATH 's3://mybucket')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other small nit

test-env DATA_PATH __TEST_DIR__

statement ok
ATTACH 'ducklake:${DUCKLAKE_CONNECTION}' AS ducklake (DATA_PATH '${DATA_PATH}/delete_metadata}')

This test will still only be executed with minio due to the require-env but could eventually be executed in dfferent scenarios in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes actually we cannot do this because of duckdb/duckdb#20396 - once the 1.4 branch updates its dependency to a more recent duckdb 1.4 tag - we'll be able to do this.

Comment on lines 39 to 48
query II
EXPLAIN ANALYZE SELECT COUNT(*) FILTER(WHERE id%2=0) FROM ducklake.test
----
analyzed_plan <REGEX>:.*#HEAD: 0.*

# we can time travel to see the state of the table before deletes
query II
EXPLAIN ANALYZE SELECT COUNT(*) FILTER(WHERE id%2=0) FROM ducklake.test AT (VERSION => 2)
----
analyzed_plan <REGEX>:.*#HEAD: 0.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually got this branch locally and did a build without your changes in src/storage/ducklake_delete_filter.cpp and the tests seems to have the same effect? Maybe I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm 🤔 I'll check this again on behalf of Mathieu who initially opened this MR. Thank you for this @pdet, I should have double-checked as well 🫡

Copy link

@leo-altertable leo-altertable Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdet Because of our local development setups we had a lot of trouble reproducing failure / success with and without the fix. That being said, it looks like the base commit we were using might have caused the flakiness you observed.

I went and created a Dockerfile (based on the Minio.yml workflow config to be as close as possible to the CI):

# Based on the MinIO.yml workflow configuration
FROM ubuntu:latest

# Prevent interactive prompts during package installation
ENV DEBIAN_FRONTEND=noninteractive

# Install required Ubuntu packages
RUN apt-get update -y -qq && \
    apt-get install -y -qq \
        software-properties-common \
        curl \
        git \
        zip \
        unzip \
        tar \
        pkg-config && \
    add-apt-repository ppa:git-core/ppa && \
    apt-get update -y -qq && \
    apt-get install -y -qq \
        build-essential \
        cmake \
        ninja-build \
        ccache \
        python3 \
        clang \
        llvm && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /workspace

# Copy the project
COPY . .

# Setup vcpkg
RUN git clone https://github.com/microsoft/vcpkg.git /workspace/vcpkg && \
    cd /workspace/vcpkg && \
    git checkout ce613c41372b23b1f51333815feb3edd87ef8a8b && \
    ./bootstrap-vcpkg.sh

# Set build environment variables
ENV VCPKG_TARGET_TRIPLET=x64-linux-release
ENV GEN=ninja
ENV VCPKG_FEATURE_FLAGS=-binarycaching
ENV VCPKG_TOOLCHAIN_PATH=/workspace/vcpkg/scripts/buildsystems/vcpkg.cmake
ENV BUILD_EXTENSION_TEST_DEPS=full
ENV S3_TEST_SERVER_AVAILABLE=1
ENV CC=clang
ENV CXX=clang++

# Show versions for verification
RUN ninja --version && \
    cmake --version && \
    clang --version

# Build the project
RUN make release

# Default command
CMD ["/bin/bash"]

Building with docker build -t ducklake-build ., and running the test with docker run --network host -it ducklake-build bash -c "S3_TEST_SERVER_AVAILABLE=1 ./build/release/test/unittest --test-config test/configs/minio.json test/sql/delete/delete_metadata.test" (having a minio instance running on localhost:9000) shows that:

  • Without the ducklake_delete_filter.cpp fix of this PR, the test fails
  • With the fix, the test succeeds.

Comment on lines +30 to +31
extended_info->options["etag"] = Value("");
extended_info->options["last_modified"] = Value::TIMESTAMP(timestamp_t(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samansmink since you are more aware than I am of the effects of S3 files in the multifile reader, can this have any ill effects?

@redox redox marked this pull request as draft January 9, 2026 16:00
@redox
Copy link
Contributor Author

redox commented Jan 13, 2026

Closing this for now; the bug is somewhere between https://github.com/duckdb/ducklake and https://github.com/dentiny/duck-read-cache-fs - we need to figure out how to test this from here if we want to fix it here.

@redox redox closed this Jan 13, 2026
@redox redox reopened this Jan 14, 2026
mchataigner and others added 3 commits January 14, 2026 12:56
…e files

**Context**: We experience slow read performance when a table has many delete files.

**TL;DR**: We can leverage the metadata already available in DuckLake to improve load time of delete files.

**Problem & Motivation:**

DuckLake stores `file_size` metadata for both data and delete files. For data files, there is already a mechanism to forward this metadata to the MultiFileReader and the underlying filesystem. The Parquet reader requires this `file_size` to access the footer metadata. When using an `HTTPFileSystem` instance (e.g., for S3, Azure), it performs a HEAD request on the file if metadata fields (`file_size`, `etag`, `last_modified`) are not present. Since all files in DuckLake are immutable, we can apply the same optimization logic for delete files to avoid these unnecessary HEAD requests.

**Solution:**

Implements a custom multi-file reading solution that pre-populates file metadata to eliminate redundant storage HEAD requests when scanning delete files:

**Key Changes:**

1. **New `DeleteFileFunctionInfo` struct**: Extends `TableFunctionInfo` to carry `DuckLakeFileData` metadata through the table function binding process.

2. **Custom `DeleteFileMultiFileReader` class**:
   - Extends DuckDB's `MultiFileReader` to intercept file list creation
   - Pre-populates `ExtendedOpenFileInfo` with metadata already available from DuckLake:
     - File size (`file_size_bytes`)
     - ETag (empty string as placeholder)
     - Last modified timestamp (set to epoch)
     - Encryption key (if present)
   - Creates a `SimpleMultiFileList` with this extended info upfront
   - Overrides `CreateFileList()` to return the pre-built list, bypassing DuckDB's default file discovery

3. **Modified `ScanDeleteFile()` method**:
   - Changed `parquet_scan` from const reference to mutable copy to allow modification
   - Attaches `DeleteFileFunctionInfo` and custom reader factory to the table function
   - Passes the actual `parquet_scan` function to `TableFunctionBindInput` instead of a dummy function, ensuring proper function context

**Performance Impact**: Eliminates HEAD requests to object storage when opening Parquet delete files. This is particularly beneficial when working with remote storage (S3, Azure, etc.) and tables with many delete files, where HEAD requests were causing significant performance bottlenecks.
@leo-altertable leo-altertable force-pushed the mbc/improve_scan_delete_files branch from 5ef6bfb to 42a586b Compare January 14, 2026 12:03
@redox redox marked this pull request as ready for review January 14, 2026 12:04
@pdet
Copy link
Collaborator

pdet commented Jan 14, 2026

Aha, we actually have a very similar PR to iceberg, code-wise looks very similar:
https://github.com/duckdb/duckdb-iceberg/pull/579/files

Maybe the test we have there would be a better fit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants