Skip to content

Fix #933 — bound compaction memory so wide-row tables don't OOM#942

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/933-compaction-memory-tuning
May 7, 2026
Merged

Fix #933 — bound compaction memory so wide-row tables don't OOM#942
erikdarlingdata merged 1 commit into
devfrom
feature/933-compaction-memory-tuning

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • Lite: fix compaction OOM by setting DuckDB temp_directory (#933) #935 added temp_directory so DuckDB could spill during compaction, but on wider workloads the working set still blew past the 4 GB cap before spill caught up (reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files).
  • Three knobs combined to feed that: memory_limit = 4GB was too high (DuckDB held off spilling), threads defaulted to N cores (per-thread row-group buffers multiplied), and ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group.
  • Drop memory_limit to 1GB, cap threads = 2, shrink ROW_GROUP_SIZE to 8192. Memory now plateaus instead of growing with row count.

Fixes #933

Repro tool

tools/CompactionRepro — standalone .NET console app that splits a real monthly parquet file into N per-cycle-shaped chunks and runs the same pair-merge logic with the tuning knobs exposed on the command line. Useful for validating future changes to compaction.

Validation

On a real local archive (202604_query_stats.parquet, 1.7M rows, ~70 MB):

Setting Peak Working Set Wall Time Output
OLD (4GB / default threads / 122880) 1236 MB 12.0 s 68.3 MB
NEW (1GB / 2 threads / 8192) 166 MB 15.7 s 77.6 MB

87% peak memory reduction. 31% slower wall time. Output 14% larger (smaller row groups → smaller compression dictionaries — acceptable trade for not crashing).

Test plan

  • Lite builds clean (0 errors)
  • Repro tool reproduces under OLD settings, succeeds under NEW settings on real archive data
  • Reporter validates against their query_snapshots workload in next nightly

🤖 Generated with Claude Code

#935 added temp_directory so DuckDB could spill, but on wider workloads
the working set still blew past the 4 GB cap before spill caught up
(reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files).
Three knobs combined to feed that:

- memory_limit = 4 GB was too high — DuckDB held off spilling until late
- threads defaulted to N cores, multiplying per-thread row-group buffers
- ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group

Drop memory_limit to 1 GB, cap threads to 2, and shrink ROW_GROUP_SIZE
to 8192. On 1.7 M rows of real query_stats data this drops peak working
set from 1236 MB → 166 MB (87% reduction) at a 31% wall-time cost.
Memory now plateaus instead of growing with row count, which is the
load-bearing change for issue #933.

Adds tools/CompactionRepro — a standalone reproducer that splits a real
monthly parquet file into N per-cycle-shaped chunks and runs the same
pair-merge logic with the tuning knobs exposed on the command line.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@erikdarlingdata erikdarlingdata merged commit 46dd1e5 into dev May 7, 2026
2 checks passed
erikdarlingdata added a commit that referenced this pull request May 12, 2026
#942 lowered the cap to 1 GB on the theory that a tight memory_limit plus
temp_directory would force DuckDB to spill earlier and keep peak working
set down. That validation ran against query_stats (narrow, ~1.7M rows) and
showed peak 1236 MB → 166 MB. The reporter's actual failure is on
query_snapshots, which carries query_text + query_plan + live_query_plan
per row. With the 1 GB cap, the nightly logs show OOM at "906/953 MiB
used" before any merge progress.

The standalone reproducer (tools/CompactionRepro) confirms the cause:
parquet COPY in DuckDB v1.5.2 makes allocations that bypass the buffer
manager and can't be spilled. The cap acts as a hard ceiling for those,
not a spill trigger. Spill on disk = 0 MB across every configuration we
tested (memory_limit 1/2/4 GB, accumulator vs tournament merge, threads
1 vs 2, :memory: vs file-backed DB). The same failure reproduces in
standalone DuckDB CLI v1.5.2, so it's an engine issue — see upstream
issues duckdb#16482 and discussion#10084.

DuckDB's own OOM guide explicitly warns about this case and recommends
memory_limit at 50-60% of system RAM, not a tight cap. 4 GB sits well
inside that range for typical workstation/server hosts and leaves real
headroom on top of the un-spillable allocations.

Reporter's actual file sizes (15-25 chunks of 2-6 MB plus a 35-45 MB
monthly file per group) are well below the level where 4 GB has any
trouble. The reproducer confirms 4 GB succeeds on a synthetic
query_snapshots-shaped dataset of ~1.5 GB with peak working set of
~400 MB; the reporter's data is ~143 MB at worst.

Also updates the stale comment about spilling — temp_directory was set
per #935 but the buffer-manager-bypassing allocations don't use it. The
comment now describes what actually happens.

The tools/CompactionRepro changes add --strategy {accumulator|tournament},
--db-mode {memory|file}, --merge-files, --synthetic data generation, and
--cycles for leak testing. These are kept so a future regression in this
area can be reproduced and diagnosed quickly.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant