Skip to content

Add Lance annotation writer stage#2113

Draft
VibhuJawa wants to merge 4 commits into
NVIDIA-NeMo:mainfrom
VibhuJawa:feat/lance-annotation-writer
Draft

Add Lance annotation writer stage#2113
VibhuJawa wants to merge 4 commits into
NVIDIA-NeMo:mainfrom
VibhuJawa:feat/lance-annotation-writer

Conversation

@VibhuJawa

Copy link
Copy Markdown
Contributor

Split from #2106. This is PR 3 of 3 in the Lance IO stack.

Stacked on #2112. The branch is based on feat/lance-writer; until #2111 and #2112 merge, GitHub may show lower-stack changes in this PR when viewed against main. For the exact PR3 delta, compare:
VibhuJawa/NeMo-Curator@feat/lance-writer...feat/lance-annotation-writer

What changed:

  • Adds LanceAnnotationWriter for sparse updates to existing Lance rows using LanceReader metadata columns.
  • Adds prepare() to create/validate annotation columns and pin the read version.
  • Adds commit_lance_annotation_checkpoint for publishing updated fragments.
  • Adds tests for sparse annotation updates, duplicate row rejection, and split-fragment checkpoint rejection.

Validation:

  • uv run --extra lance --group test pytest -q tests/stages/text/io/reader/test_lance.py tests/stages/text/io/writer/test_lance.py
  • uv run ruff check nemo_curator/stages/text/io/writer/init.py nemo_curator/stages/text/io/writer/lance.py tests/stages/text/io/writer/test_lance.py
  • git diff --check

@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@VibhuJawa VibhuJawa force-pushed the feat/lance-annotation-writer branch 5 times, most recently from e98ee3b to 8d3ff7f Compare June 24, 2026 22:16
Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>
@VibhuJawa VibhuJawa force-pushed the feat/lance-annotation-writer branch 8 times, most recently from 0166ab8 to 2c093d0 Compare June 24, 2026 23:38
Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>
@VibhuJawa VibhuJawa force-pushed the feat/lance-annotation-writer branch from 2c093d0 to 0ab3898 Compare June 24, 2026 23:41
Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>
@VibhuJawa VibhuJawa force-pushed the feat/lance-annotation-writer branch from 0ab3898 to 62134b3 Compare June 25, 2026 00:12
Signed-off-by: Vibhu Jawa <vjawa@nvidia.com>
@VibhuJawa VibhuJawa force-pushed the feat/lance-annotation-writer branch from 62134b3 to 4d819c9 Compare June 25, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant