Skip to content

Compact parquet in size-budgeted batches (#933 followup)#959

Merged
erikdarlingdata merged 2 commits into
devfrom
feature/933-compaction-size-budget-batching
May 18, 2026
Merged

Compact parquet in size-budgeted batches (#933 followup)#959
erikdarlingdata merged 2 commits into
devfrom
feature/933-compaction-size-budget-batching

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

After PR #955 landed (main connection cap, raise-for-COPY wrapper, per-pair Binder fix), the reporter's next nightly hit a new failure mode:

Failed to compact 202605/query_snapshots (72 files)
Out of Memory Error: failed to pin block of size 102.9 MiB (3.6 GiB/3.7 GiB used)

This is not the pre-reservation bug from before — DuckDB has legitimately consumed nearly the full 4 GB compaction cap doing real work. The 72-file backlog (accumulated during the broken nightlies) was too large for the pairwise-accumulator merge to handle in one pass: by step 70 of 72, the accumulator is the merge of 71 files combined.

Fix

Cap merges by total on-disk input bytes, not file count. Wide-VARCHAR rows (query_snapshots' plan XML expands ~10x on read) get fewer files per batch; narrow tables fit hundreds.

  • MaxBatchInputBytes = 200 MB — sized so even ~10x expansion stays well under the 4 GB compaction cap
  • BuildSizeBudgetedBatches(sortedPaths, maxBytes) — greedy smallest-first bucketing
  • MergeBatchToFile(table, sourcePaths, outputPath, spillDirSql) — extracted the existing single-pass-vs-pairwise merge logic so each batch is a standalone call
  • Groups that fit in one batch keep the existing YYYYMM_table.parquet output (backward compatible)
  • Groups needing multiple batches emit YYYYMM_table_pt001.parquet, _pt002.parquet, etc.; archive views already glob *_table.parquet so readers see them as one logical month
  • New regex case in file-recognition handles _ptNNN suffix so subsequent compactions round-trip part files correctly
  • Cleanup is atomic: each batch writes to a temp file; only after all batches succeed do we delete originals and promote temps

Why this is naturally adaptive

The cap is dimensional, not numeric: 200 MB of on-disk compressed parquet is a lot of rows for narrow tables (wait_stats, perfmon_stats) and few rows for wide ones. No table-specific tuning needed.

What this means for the reporter's 72-file backlog

At ~5 MB per file × 72 ≈ 360 MB total on disk. Splits into roughly two batches of ~36 files each. Each batch's in-memory peak is half what the un-batched merge needed — well within 4 GB.

Test plan

  • Build clean (dotnet build Lite/PerformanceMonitorLite.csproj -c Release)
  • Local end-to-end: rebuild with ArchiveSizeThresholdMb = 200 (matches our PR Fix #933: cap main DuckDB memory_limit, per-pair compaction exclude detection #955 validation pattern), populate archive with a synthetic backlog of wide-VARCHAR files, trigger compaction, verify _pt001/_pt002 outputs and zero OOM
  • Verify subsequent compaction correctly re-buckets the part files (the new regex case)
  • Reporter confirms on the next nightly that their 72-file backlog drains

🤖 Generated with Claude Code

erikdarlingdata and others added 2 commits May 18, 2026 09:36
After PR #955 the main connection cap and Binder fix landed, but the
reporter's next nightly hit a different failure mode: query_snapshots
compaction OOM'd at "failed to pin block of size 102.9 MiB
(3.6 GiB/3.7 GiB used)" with a 72-file backlog. Not the pre-reservation
bug — DuckDB has legitimately consumed nearly the full 4 GB cap doing
real work. Wide-VARCHAR plan XML expands ~10x on read; merging 72 files
into a single COPY needs more than 4 GB in memory.

The existing pairwise merge accumulates: by step 70 of 72, the
accumulator is the merge of 71 files combined being read alongside one
more. Bounded if file count is bounded; unbounded otherwise.

This change caps a merge by total on-disk input bytes, not file count:

- BuildSizeBudgetedBatches greedily groups smallest-first-sorted paths
  into batches whose total bytes don't exceed MaxBatchInputBytes (200 MB).
- A group that fits in one batch keeps the existing
  YYYYMM_table.parquet output name — fully backward compatible.
- A group that needs multiple batches produces YYYYMM_table_ptNNN.parquet
  part files (numbered from 001). Archive views already glob
  *_table.parquet, so readers see all parts as one logical month.
- New regex case in the file-recognition pass recognizes _ptNNN suffixes
  so subsequent compactions round-trip part files correctly.
- MergeBatchToFile factors out the single-pass-vs-pairwise logic so each
  batch is a standalone call; the cleanup orchestration (delete
  originals, promote temps) runs once after all batches succeed.

For the reporter's specific 72-file backlog at ~5 MB each (~360 MB
total), this produces roughly two batches of ~36 files each. Memory
demand of each batch is half what the un-batched merge needed, well
within the 4 GB compaction cap.

Naturally adaptive: narrow tables (wait_stats, perfmon_stats) fit
hundreds of files per batch; wide tables (query_snapshots,
query_store_stats with plan XML) get fewer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata marked this pull request as ready for review May 18, 2026 14:18
@erikdarlingdata erikdarlingdata merged commit f41c188 into dev May 18, 2026
2 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/933-compaction-size-budget-batching branch May 18, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant