Skip to content

feat: add micro_commit_batch_size param in lance_compaction function#6141

Open
huleilei wants to merge 1 commit intoEventual-Inc:mainfrom
huleilei:hll/compact_1
Open

feat: add micro_commit_batch_size param in lance_compaction function#6141
huleilei wants to merge 1 commit intoEventual-Inc:mainfrom
huleilei:hll/compact_1

Conversation

@huleilei
Copy link
Collaborator

@huleilei huleilei commented Feb 9, 2026

Changes Made

Introduce micro_commit_batch_size in daft.io.lance.compact_files and thread it through to lance_compaction to control commit batching during compaction. Docs updated (docs/connectors/lance.md) and tests extended (tests/io/lancedb/test_lancedb_compaction.py). When committing all tasks in a single batch, return CompactionMetrics; for multi-batch commits, return None. No breaking changes; minimal risk as the parameter is additive and optional.

Related Issues

@github-actions github-actions bot added the feat label Feb 9, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Greptile Overview

Greptile Summary

This PR adds a new micro_commit_batch_size parameter to the Lance compaction API, threads it through the Python entrypoint into the internal lance_compaction implementation, updates the Lance connector documentation, and extends the LanceDB compaction tests to exercise the new argument.

The change fits cleanly into the existing Lance connector surface area: it’s an additive parameter on the compaction helper and is passed through to the underlying LanceDB/Lance optimize/compaction call so users can tune commit batching behavior during compaction.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk.
  • Changes are a straightforward, additive parameter plumbing with corresponding documentation and test coverage; no behavioral change occurs unless the new parameter is provided.
  • No files require special attention

Important Files Changed

Filename Overview
daft/io/lance/_lance.py Adds a micro_commit_batch_size parameter to the Python Lance compaction entrypoint and threads it through to the underlying compaction call.
daft/io/lance/lance_compaction.py Extends lance_compaction to accept micro_commit_batch_size and passes it into the LanceDB optimize/compaction invocation.
docs/connectors/lance.md Documents the new micro_commit_batch_size parameter for Lance compaction usage.
tests/io/lancedb/test_lancedb_compaction.py Adds/updates tests to cover the new micro_commit_batch_size parameter being accepted and plumbed through compaction.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant Py as daft.io.lance._lance
    participant LC as daft.io.lance.lance_compaction
    participant Lance as LanceDB/Lance

    U->>Py: lance_compaction(..., micro_commit_batch_size=K)
    Py->>LC: lance_compaction(..., micro_commit_batch_size=K)
    LC->>Lance: optimize/compact(..., micro_commit_batch_size=K)
    Lance-->>LC: compaction result
    LC-->>Py: return result
    Py-->>U: return result
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.41%. Comparing base (de28b9b) to head (8810cdd).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
daft/io/lance/lance_compaction.py 94.73% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6141      +/-   ##
==========================================
+ Coverage   73.38%   73.41%   +0.02%     
==========================================
  Files         990      993       +3     
  Lines      128853   129163     +310     
==========================================
+ Hits        94557    94823     +266     
- Misses      34296    34340      +44     
Files with missing lines Coverage Δ
daft/io/lance/_lance.py 92.10% <ø> (+0.92%) ⬆️
daft/io/lance/lance_compaction.py 89.74% <94.73%> (-0.89%) ⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@huleilei
Copy link
Collaborator Author

@universalmind303 @Jay-ju help me review when you are convenient. Thanks


dataset = lance.dataset(str(dataset_path))
post_fragments = len(dataset.get_fragments())
post_rows = dataset.count_rows()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the judgment here not very convincing? Shouldn't it be judged whether your micro_commit_batch_size has produced an effect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this test case is primarily to validate the functionality of the micro_commit_batch_size parameter. However, performing batch commits during compaction may result in the creation of multiple Lance dataset versions. In the current implementation of LanceDB, batch compaction can lead to an inconsistent number of versions due to scenarios such as data conflicts. Therefore, I believe that only the functionality needs to be validated in this context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants