Skip to content

feat(parquet): add ParquetPushDecoder::swap_strategy for adaptive scans#9

Open
adriangb wants to merge 2 commits intomainfrom
adaptive-strategy-swap
Open

feat(parquet): add ParquetPushDecoder::swap_strategy for adaptive scans#9
adriangb wants to merge 2 commits intomainfrom
adaptive-strategy-swap

Conversation

@adriangb
Copy link
Copy Markdown
Member

Not for upstream merge — opened to run CI before landing the DataFusion-side change that depends on this branch (adriangb/datafusion#11).

Summary

Adds a small surface to the push decoder so callers can swap the RowFilter, ProjectionMask, and/or RowSelectionPolicy at row-group boundaries without rebuilding the decoder:

  • pub fn can_swap_strategy(&self) -> bool — true between row groups (outer state ReadingRowGroup, inner state Finished)
  • pub fn swap_strategy(&mut self, swap: StrategySwap) -> Result<()> — rejected with ParquetError::General when called mid-row-group
  • pub struct StrategySwap (#[non_exhaustive]) with builder methods with_projection, with_filter, with_row_selection_policy
  • pub fn row_groups_remaining(&self) -> usize for diagnostics

PushBuffers carries through the swap, so bytes already fetched for columns that survive into the new strategy are reused — only bytes the new strategy needs but that aren't already buffered get requested via NeedsData.

Adaptive callers should drive the decoder with try_next_reader rather than try_decode: handing the active reader off transitions the decoder back to ReadingRowGroup immediately, giving the caller a clean swap window between two consecutive returns. try_decode loops past row-group boundaries internally and is unsuitable for in-flight strategy changes.

Test plan

  • cargo test -p parquet --lib --features arrow,async,object_store — 1121 passed
  • 4 new tests in push_decoder/mod.rs::test:
    • test_swap_strategy_installs_filter_between_row_groups
    • test_swap_strategy_rejected_mid_row_group
    • test_swap_strategy_allowed_while_iterating_handed_off_reader
    • test_swap_strategy_preserves_buffered_bytes
  • CI on this PR

🤖 Generated with Claude Code

adriangb and others added 2 commits April 27, 2026 23:50
Add a small surface to the push decoder so callers can swap the
RowFilter, ProjectionMask, and/or RowSelectionPolicy at row-group
boundaries without rebuilding the decoder:

- new pub fn `can_swap_strategy() -> bool` — true between row groups
  (outer state `ReadingRowGroup`, inner state `Finished`)
- new pub fn `swap_strategy(StrategySwap) -> Result<()>` — rejected
  with `ParquetError::General` when called mid-row-group
- new `pub struct StrategySwap` (`#[non_exhaustive]`) with builder
  methods `with_projection`, `with_filter`, `with_row_selection_policy`
- new pub fn `row_groups_remaining() -> usize` for diagnostics

`PushBuffers` carries through the swap, so bytes already fetched for
columns that survive into the new strategy are reused — only bytes the
new strategy needs but that aren't already buffered get requested via
`NeedsData`.

Adaptive callers should drive the decoder with `try_next_reader`
rather than `try_decode`: handing the active reader off transitions
the decoder back to `ReadingRowGroup` immediately, giving the caller
a clean swap window between two consecutive returns. `try_decode`
loops past row-group boundaries internally and is unsuitable for
in-flight strategy changes.

Tests cover: filter swap between row groups, mid-row-group rejection,
swap-while-iterating-handed-off-reader, and projection narrowing
that reuses already-buffered bytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`PushBuffers` is `pub(crate)` in the arrow-rs workspace, so the
intra-doc link `[\`PushBuffers\`]: crate::util::push_buffers::PushBuffers`
in `ParquetPushDecoder::swap_strategy`'s public docs failed the
`-D rustdoc::private-intra-doc-links` check on `cargo +nightly doc
--document-private-items`. The rest of the doc is unchanged; the
prose now refers to the type by name without a link.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant