perf: route BitVec.flattenList through the chunked Array merge core#14097
Draft
kim-em wants to merge 7 commits into
Draft
perf: route BitVec.flattenList through the chunked Array merge core#14097kim-em wants to merge 7 commits into
BitVec.flattenList through the chunked Array merge core#14097kim-em wants to merge 7 commits into
Conversation
…ck overflow This PR adds tail-recursive replacements for `BitVec.ofBoolListLE` and `BitVec.ofBoolListBE`, registered via `@[csimp]`, to avoid stack overflow on lists with ~1M elements. The reference definitions in `Init.Data.BitVec.Basic` recurse via `concat`, which is clean for proofs but allocates O(n) stack frames. The new implementations in `Init.Data.BitVec.Impl` pack bits in 64-bit chunks (`packChunk`, `collectChunks`) and combine them via a balanced tree merge (`mergePass`, `treeMerge`), giving O(n log n) work and O(1) stack usage. Correctness is established via a list-level spec function `flattenList` giving the intended Nat semantics of `(value, width)` pairs, with `flattenList_append`, `flattenList_mergePassList` (key bit-packing identity), and a chunk-local `testBit_flattenList_collectChunks_aux`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….Internal` Codex review feedback applied: - Mark `packChunk`, `collectChunks`, `mergePass`, `treeMerge` as `private` — these have unreachable-fuel branches that make them bad public contracts. - Mark proof-only scaffolding (`flattenList`, `totalWidth`, `WellFormedList`, `mergePassList`) as `private` to avoid exposing them as API. - Extract `half_le_pow_of_le_double` (the `arr.size ≤ 2^(k+1) → (arr.size+1)/2 ≤ 2^k` bound used in the `treeMerge` halving step) into a standalone lemma. This isolates the omega/`Int.pow_succ` workaround to one place and turns `treeMerge_go_eq_flattenList`'s arithmetic step into a one-liner. Public surface is now just `ofBoolListLEImpl`, `ofBoolListBEImpl`, and the two `@[csimp]` theorems. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the file-wide `public section` with `public` modifiers on just the two `ofBoolListLEImpl`/`ofBoolListBEImpl` defs and the two `@[csimp]` theorems. Everything else (the chunked-encoding helpers `packChunk`, `collectChunks`, `mergePass`, `treeMerge`, the proof-only `flattenList`/`totalWidth`/ `WellFormedList`/`mergePassList`, and all auxiliary lemmas) is now file-local. Also drop redundant imports `Init.Data.Nat.Lemmas` (transitive via `Init.Data.Array.Lemmas`) and `Init.Data.List.Lemmas` (transitive via `Init.Data.List.Nat.TakeDrop`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR marks `BitVec.ofBoolListLE` and `BitVec.ofBoolListBE` as `noncomputable`, so the `@[csimp]` lemmas pointing them at their tail-recursive implementations always take effect and there is no risk of compiling against the non-tail-recursive reference definitions. This also keeps the `elab/csimpCore.lean` test invariant that `@[csimp]` is only applied to `noncomputable` defs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tenList` identities This PR rewrites the correctness proofs in `Init.Data.BitVec.Impl` so that every step is an unconditional `Nat` identity, removing the `WellFormedList` invariant entirely. The key observation is that `flattenList_pack` — merging two adjacent `(value, width)` fields — is pure `|||`/`<<<` algebra and needs no `value < 2^width` hypothesis. With that hypothesis gone, `flattenList_mergePassList`, `treeMerge_eq_flattenList`, and `flattenList_collectChunks` no longer need well-formedness either, so `WellFormedList`, `flattenList_lt`, `mergePassList_wellFormed`, `collectChunks_wellFormed`, and the bespoke `testBit`-indexing lemmas all become dead code. The runtime definitions (`packChunk`, `collectChunks`, `mergePass`, `treeMerge`, `ofBoolListLEImpl`, `ofBoolListBEImpl`) are unchanged; benchmarked speed and all test results are identical. `Impl.lean` shrinks from 680 to 450 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This PR shortens several proofs in `Init.Data.BitVec.Impl` without touching the runtime definitions: it replaces hand-rolled `toList_size_one`/`toList_size_zero` proofs with the existing `Array.size_eq_one_iff`/`List.length_eq_zero_iff`, collapses the `cases r` branches of `packChunk_used`/`packChunk_rest`, drops the trivial `mergePassList_nil`/`mergePassList_singleton` lemmas in favour of inline `simp`, and golfs `mergePass_size`, `totalWidth_map_leaf`, and `getLsbD_ofBoolListLEImpl`. `Impl.lean` shrinks from 450 to 426 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
cac13d7 to
2083268
Compare
This PR makes the compiled implementation of `BitVec.flattenList` share the chunked `Array (Nat × Nat)` balanced-merge core (`mergePass`/`treeMerge`) that `BitVec.ofBoolListLE` already uses, replacing the divide-and-conquer `flattenList.toNatAux` as the `@[csimp]` target. The new `BitVec.Internal.flattenListImpl` packs `chunkCap n = max 1 (64 / n)` width-`n` values per ~64-bit chunk (`packChunkBV`, `collectChunksBV`), then tree-merges, giving `O(1)` stack usage and avoiding the `O(log L)`-depth recursion and the per-node `take`/`drop` sublist allocation of the previous divide-and-conquer worker. For small widths the packing also collapses the bottom `log₂ (chunkCap n)` levels of the merge tree into single machine-word operations. Correctness reuses the existing `Array`-core lemmas (`treeMerge_eq_flattenList`, `flattenList_append`, `flattenList_pack`): a chunk-local `flattenList_collectChunksBV` identity composes with a `toNat_flattenList_eq` bridge that reconciles the head-high orientation of `BitVec.flattenList` with the head-low `Nat × Nat` spec (mirroring the reverse already used by `ofBoolListBE`). The divide-and-conquer `flattenListFast` is kept as a reference implementation; `flattenList_eq_flattenListFast` remains as a plain theorem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2083268 to
6aace2f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes the compiled implementation of
BitVec.flattenListshare the chunkedArray (Nat × Nat)balanced-merge core (mergePass/treeMerge) thatBitVec.ofBoolListLEuses, replacing the divide-and-conquerflattenList.toNatAuxas the@[csimp]target. The newBitVec.Internal.flattenListImplpackschunkCap n = max 1 (64 / n)width-nvalues per ~64-bit chunk (packChunkBV,collectChunksBV), then tree-merges. This recoversO(1)stack usage and removes the per-nodetake/dropsublist allocation of the previousO(log L)-depth divide-and-conquer worker; for small widths the packing also collapses the bottomlog₂ (chunkCap n)levels of the merge tree into single machine-word operations.Correctness reuses the existing
Array-core lemmas (treeMerge_eq_flattenList,flattenList_append,flattenList_pack): a chunk-localflattenList_collectChunksBVidentity composes with atoNat_flattenList_eqbridge reconciling the head-high orientation ofBitVec.flattenListwith the head-lowNat × Natspec (mirroring the reverse already used byofBoolListBE). The now-superseded divide-and-conquerflattenListFast/flattenList.toNatAuxand theflattenList_eq_flattenListFasttheorem are removed.Blocked by #13576 (
fix: tail-recursive BitVec.ofBoolListLE/ofBoolListBE to avoid stack overflow): this branch is stacked on it and reuses itsBitVec.InternalArraymerge core, so the diff shown here includes that PR's changes until it merges. Review/merge #13576 first; this PR should then be retargeted/rebased onto the result.🤖 Prepared with Claude Code