perf: route `BitVec.flattenList` through the chunked `Array` merge core by kim-em · Pull Request #14097 · leanprover/lean4

kim-em · 2026-06-18T04:20:42Z

This PR makes the compiled implementation of BitVec.flattenList share the chunked Array (Nat × Nat) balanced-merge core (mergePass/treeMerge) that BitVec.ofBoolListLE uses, replacing the divide-and-conquer flattenList.toNatAux as the @[csimp] target. The new BitVec.Internal.flattenListImpl packs chunkCap n = max 1 (64 / n) width-n values per ~64-bit chunk (packChunkBV, collectChunksBV), then tree-merges. This recovers O(1) stack usage and removes the per-node take/drop sublist allocation of the previous O(log L)-depth divide-and-conquer worker; for small widths the packing also collapses the bottom log₂ (chunkCap n) levels of the merge tree into single machine-word operations.

Correctness reuses the existing Array-core lemmas (treeMerge_eq_flattenList, flattenList_append, flattenList_pack): a chunk-local flattenList_collectChunksBV identity composes with a toNat_flattenList_eq bridge reconciling the head-high orientation of BitVec.flattenList with the head-low Nat × Nat spec (mirroring the reverse already used by ofBoolListBE). The now-superseded divide-and-conquer flattenListFast/flattenList.toNatAux and the flattenList_eq_flattenListFast theorem are removed.

Blocked by #13576 (fix: tail-recursive BitVec.ofBoolListLE/ofBoolListBE to avoid stack overflow): this branch is stacked on it and reuses its BitVec.Internal Array merge core, so the diff shown here includes that PR's changes until it merges. Review/merge #13576 first; this PR should then be retargeted/rebased onto the result.

🤖 Prepared with Claude Code

…ck overflow This PR adds tail-recursive replacements for `BitVec.ofBoolListLE` and `BitVec.ofBoolListBE`, registered via `@[csimp]`, to avoid stack overflow on lists with ~1M elements. The reference definitions in `Init.Data.BitVec.Basic` recurse via `concat`, which is clean for proofs but allocates O(n) stack frames. The new implementations in `Init.Data.BitVec.Impl` pack bits in 64-bit chunks (`packChunk`, `collectChunks`) and combine them via a balanced tree merge (`mergePass`, `treeMerge`), giving O(n log n) work and O(1) stack usage. Correctness is established via a list-level spec function `flattenList` giving the intended Nat semantics of `(value, width)` pairs, with `flattenList_append`, `flattenList_mergePassList` (key bit-packing identity), and a chunk-local `testBit_flattenList_collectChunks_aux`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

….Internal` Codex review feedback applied: - Mark `packChunk`, `collectChunks`, `mergePass`, `treeMerge` as `private` — these have unreachable-fuel branches that make them bad public contracts. - Mark proof-only scaffolding (`flattenList`, `totalWidth`, `WellFormedList`, `mergePassList`) as `private` to avoid exposing them as API. - Extract `half_le_pow_of_le_double` (the `arr.size ≤ 2^(k+1) → (arr.size+1)/2 ≤ 2^k` bound used in the `treeMerge` halving step) into a standalone lemma. This isolates the omega/`Int.pow_succ` workaround to one place and turns `treeMerge_go_eq_flattenList`'s arithmetic step into a one-liner. Public surface is now just `ofBoolListLEImpl`, `ofBoolListBEImpl`, and the two `@[csimp]` theorems. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the file-wide `public section` with `public` modifiers on just the two `ofBoolListLEImpl`/`ofBoolListBEImpl` defs and the two `@[csimp]` theorems. Everything else (the chunked-encoding helpers `packChunk`, `collectChunks`, `mergePass`, `treeMerge`, the proof-only `flattenList`/`totalWidth`/ `WellFormedList`/`mergePassList`, and all auxiliary lemmas) is now file-local. Also drop redundant imports `Init.Data.Nat.Lemmas` (transitive via `Init.Data.Array.Lemmas`) and `Init.Data.List.Lemmas` (transitive via `Init.Data.List.Nat.TakeDrop`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR marks `BitVec.ofBoolListLE` and `BitVec.ofBoolListBE` as `noncomputable`, so the `@[csimp]` lemmas pointing them at their tail-recursive implementations always take effect and there is no risk of compiling against the non-tail-recursive reference definitions. This also keeps the `elab/csimpCore.lean` test invariant that `@[csimp]` is only applied to `noncomputable` defs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tenList` identities This PR rewrites the correctness proofs in `Init.Data.BitVec.Impl` so that every step is an unconditional `Nat` identity, removing the `WellFormedList` invariant entirely. The key observation is that `flattenList_pack` — merging two adjacent `(value, width)` fields — is pure `|||`/`<<<` algebra and needs no `value < 2^width` hypothesis. With that hypothesis gone, `flattenList_mergePassList`, `treeMerge_eq_flattenList`, and `flattenList_collectChunks` no longer need well-formedness either, so `WellFormedList`, `flattenList_lt`, `mergePassList_wellFormed`, `collectChunks_wellFormed`, and the bespoke `testBit`-indexing lemmas all become dead code. The runtime definitions (`packChunk`, `collectChunks`, `mergePass`, `treeMerge`, `ofBoolListLEImpl`, `ofBoolListBEImpl`) are unchanged; benchmarked speed and all test results are identical. `Impl.lean` shrinks from 680 to 450 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This PR shortens several proofs in `Init.Data.BitVec.Impl` without touching the runtime definitions: it replaces hand-rolled `toList_size_one`/`toList_size_zero` proofs with the existing `Array.size_eq_one_iff`/`List.length_eq_zero_iff`, collapses the `cases r` branches of `packChunk_used`/`packChunk_rest`, drops the trivial `mergePassList_nil`/`mergePassList_singleton` lemmas in favour of inline `simp`, and golfs `mergePass_size`, `totalWidth_map_leaf`, and `getLsbD_ofBoolListLEImpl`. `Impl.lean` shrinks from 450 to 426 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mathlib-lean-pr-testing · 2026-06-18T04:50:22Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 1fb7efad0255f2f227a68e7023c4c5a6a8e42265 --onto 4792cd22887c8b529a351f6563b693426ff2a8f8. You can force Mathlib CI using the force-mathlib-ci label. (2026-06-18 04:50:22)

leanprover-bot · 2026-06-18T04:50:23Z

Reference manual CI status:

❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 1fb7efad0255f2f227a68e7023c4c5a6a8e42265 --onto 803553a556fd82fa1060efb0c43eda542130cb16. You can force reference manual CI using the force-manual-ci label. (2026-06-18 04:50:23)

This PR makes the compiled implementation of `BitVec.flattenList` share the chunked `Array (Nat × Nat)` balanced-merge core (`mergePass`/`treeMerge`) that `BitVec.ofBoolListLE` already uses, replacing the divide-and-conquer `flattenList.toNatAux` as the `@[csimp]` target. The new `BitVec.Internal.flattenListImpl` packs `chunkCap n = max 1 (64 / n)` width-`n` values per ~64-bit chunk (`packChunkBV`, `collectChunksBV`), then tree-merges, giving `O(1)` stack usage and avoiding the `O(log L)`-depth recursion and the per-node `take`/`drop` sublist allocation of the previous divide-and-conquer worker. For small widths the packing also collapses the bottom `log₂ (chunkCap n)` levels of the merge tree into single machine-word operations. Correctness reuses the existing `Array`-core lemmas (`treeMerge_eq_flattenList`, `flattenList_append`, `flattenList_pack`): a chunk-local `flattenList_collectChunksBV` identity composes with a `toNat_flattenList_eq` bridge that reconciles the head-high orientation of `BitVec.flattenList` with the head-low `Nat × Nat` spec (mirroring the reverse already used by `ofBoolListBE`). The divide-and-conquer `flattenListFast` is kept as a reference implementation; `flattenList_eq_flattenListFast` remains as a plain theorem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

kim-em and others added 6 commits June 18, 2026 03:57

kim-em added the changelog-library Library label Jun 18, 2026

github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jun 18, 2026

kim-em force-pushed the flattenList-chunked-array-core branch from cac13d7 to 2083268 Compare June 18, 2026 05:37

kim-em force-pushed the flattenList-chunked-array-core branch from 2083268 to 6aace2f Compare June 18, 2026 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: route `BitVec.flattenList` through the chunked `Array` merge core#14097

perf: route `BitVec.flattenList` through the chunked `Array` merge core#14097
kim-em wants to merge 7 commits into
leanprover:masterfrom
kim-em:flattenList-chunked-array-core

kim-em commented Jun 18, 2026 •

edited

Loading

Uh oh!

mathlib-lean-pr-testing Bot commented Jun 18, 2026

Uh oh!

leanprover-bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kim-em commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathlib-lean-pr-testing Bot commented Jun 18, 2026

Uh oh!

leanprover-bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kim-em commented Jun 18, 2026 •

edited

Loading