Compact parquet in size-budgeted batches (#933 followup) by erikdarlingdata · Pull Request #959 · erikdarlingdata/PerformanceMonitor

erikdarlingdata · 2026-05-18T13:36:36Z

Summary

After PR #955 landed (main connection cap, raise-for-COPY wrapper, per-pair Binder fix), the reporter's next nightly hit a new failure mode:

Failed to compact 202605/query_snapshots (72 files)
Out of Memory Error: failed to pin block of size 102.9 MiB (3.6 GiB/3.7 GiB used)

This is not the pre-reservation bug from before — DuckDB has legitimately consumed nearly the full 4 GB compaction cap doing real work. The 72-file backlog (accumulated during the broken nightlies) was too large for the pairwise-accumulator merge to handle in one pass: by step 70 of 72, the accumulator is the merge of 71 files combined.

Fix

Cap merges by total on-disk input bytes, not file count. Wide-VARCHAR rows (query_snapshots' plan XML expands ~10x on read) get fewer files per batch; narrow tables fit hundreds.

MaxBatchInputBytes = 200 MB — sized so even ~10x expansion stays well under the 4 GB compaction cap
BuildSizeBudgetedBatches(sortedPaths, maxBytes) — greedy smallest-first bucketing
MergeBatchToFile(table, sourcePaths, outputPath, spillDirSql) — extracted the existing single-pass-vs-pairwise merge logic so each batch is a standalone call
Groups that fit in one batch keep the existing YYYYMM_table.parquet output (backward compatible)
Groups needing multiple batches emit YYYYMM_table_pt001.parquet, _pt002.parquet, etc.; archive views already glob *_table.parquet so readers see them as one logical month
New regex case in file-recognition handles _ptNNN suffix so subsequent compactions round-trip part files correctly
Cleanup is atomic: each batch writes to a temp file; only after all batches succeed do we delete originals and promote temps

Why this is naturally adaptive

The cap is dimensional, not numeric: 200 MB of on-disk compressed parquet is a lot of rows for narrow tables (wait_stats, perfmon_stats) and few rows for wide ones. No table-specific tuning needed.

What this means for the reporter's 72-file backlog

At ~5 MB per file × 72 ≈ 360 MB total on disk. Splits into roughly two batches of ~36 files each. Each batch's in-memory peak is half what the un-batched merge needed — well within 4 GB.

Test plan

Build clean (dotnet build Lite/PerformanceMonitorLite.csproj -c Release)
Local end-to-end: rebuild with ArchiveSizeThresholdMb = 200 (matches our PR Fix #933: cap main DuckDB memory_limit, per-pair compaction exclude detection #955 validation pattern), populate archive with a synthetic backlog of wide-VARCHAR files, trigger compaction, verify _pt001/_pt002 outputs and zero OOM
Verify subsequent compaction correctly re-buckets the part files (the new regex case)
Reporter confirms on the next nightly that their 72-file backlog drains

🤖 Generated with Claude Code

After PR #955 the main connection cap and Binder fix landed, but the reporter's next nightly hit a different failure mode: query_snapshots compaction OOM'd at "failed to pin block of size 102.9 MiB (3.6 GiB/3.7 GiB used)" with a 72-file backlog. Not the pre-reservation bug — DuckDB has legitimately consumed nearly the full 4 GB cap doing real work. Wide-VARCHAR plan XML expands ~10x on read; merging 72 files into a single COPY needs more than 4 GB in memory. The existing pairwise merge accumulates: by step 70 of 72, the accumulator is the merge of 71 files combined being read alongside one more. Bounded if file count is bounded; unbounded otherwise. This change caps a merge by total on-disk input bytes, not file count: - BuildSizeBudgetedBatches greedily groups smallest-first-sorted paths into batches whose total bytes don't exceed MaxBatchInputBytes (200 MB). - A group that fits in one batch keeps the existing YYYYMM_table.parquet output name — fully backward compatible. - A group that needs multiple batches produces YYYYMM_table_ptNNN.parquet part files (numbered from 001). Archive views already glob *_table.parquet, so readers see all parts as one logical month. - New regex case in the file-recognition pass recognizes _ptNNN suffixes so subsequent compactions round-trip part files correctly. - MergeBatchToFile factors out the single-pass-vs-pairwise logic so each batch is a standalone call; the cleanup orchestration (delete originals, promote temps) runs once after all batches succeed. For the reporter's specific 72-file backlog at ~5 MB each (~360 MB total), this produces roughly two batches of ~36 files each. Memory demand of each batch is half what the un-batched merge needed, well within the 4 GB compaction cap. Naturally adaptive: narrow tables (wait_stats, perfmon_stats) fit hundreds of files per batch; wide tables (query_snapshots, query_store_stats with plan XML) get fewer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

erikdarlingdata and others added 2 commits May 18, 2026 09:36

Merge branch 'dev' into feature/933-compaction-size-budget-batching

6013533

erikdarlingdata marked this pull request as ready for review May 18, 2026 14:18

erikdarlingdata mentioned this pull request May 18, 2026

[BUG] Memory usage on client #933

Closed

erikdarlingdata merged commit f41c188 into dev May 18, 2026
2 checks passed

erikdarlingdata deleted the feature/933-compaction-size-budget-batching branch May 18, 2026 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact parquet in size-budgeted batches (#933 followup)#959

Compact parquet in size-budgeted batches (#933 followup)#959
erikdarlingdata merged 2 commits into
devfrom
feature/933-compaction-size-budget-batching

erikdarlingdata commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikdarlingdata commented May 18, 2026

Summary

Fix

Why this is naturally adaptive

What this means for the reporter's 72-file backlog

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant