Experimenting with different adder implementations by tautschnig · Pull Request #7993 · diffblue/cbmc

tautschnig · 2023-11-02T12:15:05Z

Depends-on: #8990

The experiments were performed using the following example:

 #ifndef SIZE
 #define SIZE 100
 #endif

 #ifndef UNSAT
 int main()
 {
   int a[SIZE], b[SIZE];
   for(int i=0; i < SIZE; ++i)
     __CPROVER_assert(a[i] + b[i] > a[i], "");
 }
 #else
 #include <limits.h>

 int main()
 {
   int a[SIZE], b[SIZE];
   for(int i=0; i < SIZE; ++i)
   {
     __CPROVER_assert(
       b[i] <= 0 ||
       (a[i] >= (INT_MAX >> 28) || b[i] >= (INT_MAX >> 28)) ||
       (a[i] <= (INT_MIN >> 1) || b[i] <= (INT_MIN >> 1)) ||
       a[i] + b[i] > a[i], "");
   }
 }
 #endif

See https://tinyurl.com/adder-comparison
(https://docs.google.com/spreadsheets/d/1ckrIwxe5QVUu12rs3OaY3o7CQ-UqFnEWvcEOf8cw580/) for results.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

codecov · 2023-11-03T09:50:32Z

Codecov Report

❌ Patch coverage is 0.97087% with 306 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.32%. Comparing base (6df204e) to head (1fd90bb).

Files with missing lines	Patch %	Lines
src/solvers/flattening/bv_utils.cpp	0.97%	306 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7993      +/-   ##
===========================================
- Coverage    80.46%   80.32%   -0.14%     
===========================================
  Files         1704     1704              
  Lines       188665   188973     +308     
  Branches        73       73              
===========================================
- Hits        151805   151801       -4     
- Misses       36860    37172     +312

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ound Beame and Liew (CAV 2017 / JACM 2019) proved polynomial-size regular resolution refutations exist for commutativity and any degree-2 ring identity on array, diagonal, and Booth multipliers, and quasi-polynomial (n^O(log n)) for Wallace tree multipliers. This refutes Biere's 2016 conjecture of exponential hardness. Our empirical CDCL data (N2 experiment, tab:n2) is still valid but the framing must change: what we observe is NOT an inherent proof-complexity lower bound -- it is the gap between the *existence* of short proofs (proven) and *CDCL's ability to find them* on standard encodings. Beame and Liew themselves identify this as an open engineering problem, calling for 'guiding information to add, either to the formulas derived from the circuits or to CDCL SAT solver heuristics'. Our encoding- selection approach is precisely such guidance. Reframing applied in six places: 1. Abstract: 'SAT hardness' -> 'CDCL search difficulty'; adds reference to Beame-Liew upper bound. 2. §1 contributions (#2): 'causal hardness driver' -> 'causal driver of CDCL search difficulty' with explicit note that proof-theoretic hardness does not hold. 3. §3 Carry Propagation Hardness: new paragraph 'The gap is between proof existence and CDCL search' with Beame-Liew quotation; removes the old 'grows exponentially' claim and Haken-1985 reference as misleading analogies. 4. §3 N2 observation (1): 'Carry presence is the dominant causal factor' -> 'Carry presence dominates CDCL search difficulty'. 5. §3 N2 observation (4) and Scope of the claim: rewritten to cite Beame-Liew upper bound and frame our result as CDCL-search-relative, not proof-complexity-absolute. 6. Conclusion: reframes the carry-propagation finding as CDCL search difficulty; replaces the N1-as-future-work paragraph with 'can CDCL be steered to find Beame-Liew's polynomial proofs without encoding guidance' as the open question. Also adds to Related Work: new 'Resolution upper bounds for ring identities' paragraph introducing Beame-Liew's construction, citing their open problem, and positioning our work as addressing it via encoding engineering. New bib entries: beame2017towards (CAV), beame2019toward (JACM). Paper 1: 21 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…ing-learned-clauses Adds two more bibliography entries in the 'escape Resolution hardness' line of work, and extends the Related Work and Conclusion to contextualize our encoding-selection approach alongside them. New bib entries: - danner2025sat: Julian Danner, 'SAT Solving Using XOR-OR-AND Normal Forms and Cryptographic Fault Attacks', PhD thesis, University of Passau, April 2025 (reviewers: Kreuzer, Biere). Dissertation is citable; it develops CDXCL/CDXCLlite, a CDCL generalization to XNF, implemented as Xorricane/2-Xornado, provably escaping Resolution lower bounds on XOR-rich instances. - pollitt2026factoring: Pollitt, Battleman, Fleury, Vizel, Heule, Biere, Bryant, 'Factoring Learned Clauses', SAT 2026 (to appear). Revives Extended Resolution in modern CDCL: CaDiCaL-ER (GlucosER- style definition introduction during conflict analysis) and CaDiCaL-FX (BVA extended to XOR/ITE gate detection on learned clauses), with polynomial-size ER proofs for Tseitin formulas. Paper changes: - Related Work: The 'Resolution upper bounds for ring identities' paragraph (which discusses Beame-Liew 2017/2019) is now followed by a new 'Escaping resolution hardness in CDCL' paragraph that positions three complementary approaches: (i) encoding-level guidance (our paper), (ii) solver-level generalization to stronger proof systems (Danner 2025, and related work on CDCL + XOR reasoning with bidirectional Res(XOR) simulation), (iii) inprocessing-level definition introduction (Pollitt et al. 2026 CaDiCaL-FX). - Conclusion: updated fourth future-work item to acknowledge that two parallel research lines (solver-level generalization and inprocessing-level factoring) are also active, and that encoding-, solver-, and inprocessing-level guidance are combinable. The Beame-Sun 'CDCL(XOR)' paper (sat-paper90.pdf) is discussed without citation, as it is accepted but not yet published; we credit the broad line of work instead. Paper 1: 21 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Authors (alphabetical by surname): - Armin Biere (Amazon Web Services and University of Freiburg) - Martin Brain (City St George's, University of London) - Daniel Kroening (Amazon Web Services) - Norbert Manthey (Amazon Web Services) - Michael Tautschnig (Amazon Web Services and Queen Mary University of London) Replaces the 'Michael Tautschnig + TBD' placeholder. Paper 1: 21 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The revised research plan set reflects the discovery that Beame & Liew (CAV 2017, JACM 2019) have polynomial-size regular resolution proofs of degree-2 ring identities on array, diagonal, and Booth multipliers - refuting the exponential-hardness conjecture that underpinned the earlier N1 plan. **N1 (rewritten)** - research-plans/N1-cdcl-vs-beame-liew-gap.md: Replaces the exponential-lower-bound plan (now dead). New goal: characterize why CDCL does not find Beame-Liew's polynomial proofs on standard multiplier encodings, and under what formula- level transformations it does. Three proposed approaches (empirical, interpolation-theoretical, hybrid). Supersedes N1-exponential-lower-bound.md, which is removed. **N3 (new)** - research-plans/N3-beame-liew-implementation.md: Plan to implement Beame-Liew's critical-strip construction as an executable DRAT proof generator. Reviewer communication indicated Beame's student attempted and failed; the failure mode is undocumented. We propose phased implementation (single strip -> multi-strip -> full refutation -> scaling experiments) with heuristic extraction as a stretch goal. Orthogonal to N1 but could supply Approach A's gold-standard reference proof. **N4 (new)** - research-plans/N4-multi-encoding.md: Ablation design for a new encoding technique: encode the same multiplication with two different encodings, shared inputs and outputs, separate internal variables, output-equality constraints. Implemented as 'secondary_encoding' string on bv_utilst, wrapper at the top of unsigned_multiplier, env-var CBMC_MULTI_ENCODING wiring in smt2_solver. Early sanity check on comm_12 (algebraic disabled): - shift-add alone: 533s (T/O-prone) - comba-cs alone: 1.04s - multi comba-cs + shift-add: 1.08s suggests a portfolio-like effect: ~4% overhead vs the faster single encoding, 500x speedup vs the slower one. Full ablation sweep is running in the background; plan describes three possible outcomes (positive / portfolio / negative) and their writeup implications. Implementation details: - src/solvers/flattening/bv_utils.h: new member 'secondary_encoding' (std::string), getter/setter. - src/solvers/flattening/bv_utils.cpp: wrapper at top of unsigned_multiplier that, when secondary_encoding is non-empty, computes both encodings and set_equal's outputs. Handles shift-add/comba/comba-cs/dadda/dadda-cs/wallace/ booth/block4/sortnet as valid secondary names. - src/solvers/flattening/boolbv.h: passthrough setter. - src/solvers/smt2/smt2_solver.cpp: reads CBMC_MULTI_ENCODING env var in configure_encodings. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirror the cross-cutting notation and clarity fixes made in the co-authoring repository (~/multiplier-encodings.git) so the two versions of Paper 1 stay close in substance even while Martin's intro rewrite diverges. Changes applied here: - 'bitwidth' -> 'bit-width' (15 occurrences) for notation consistency. - 'partial product matrix / array' -> 'partial-product grid' throughout. - Expanded Comba description in Background to explicitly explain why the column-wise + immediate-carry combination limits inprocessing. - Rewrote E0-E4 variant descriptions in §3 with explicit accumulation-rule pseudo-code, plus a preamble clarifying that all five variants share identical partial products and differ only in the accumulator. - Added kroening2016decision and barrett2021smt bib entries (as canonical references for bit-blasting). Paper 1: 21 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

N4 ablation sweep completed: 9 benchmarks x 4 primary encodings x 7 secondary encodings (including singleton baselines) = 216 measurements on CaDiCaL with algebraic pre-solver disabled, 60s timeout. Headline finding: multi-encoding exhibits PORTFOLIO-LIKE BEHAVIOR. The solver receives both encodings of the same multiplication with shared inputs and outputs, and performs approximately as a free portfolio of the two singletons: - Median multi/min(A,B) = 1.31x (30% overhead vs best singleton) - Geom mean multi/min(A,B) = 1.23x - Median multi/max(A,B) = 0.14x (7x faster than slower singleton) - 42% of multi runs within 1.25x of min(A,B) - 40% more than 1.5x slower than min(A,B) -- trade-off is real - 44 'rescue' cases where one singleton T/Os but multi solves New Paper 1 subsection 5.4 'Multi-Encoding Combinations' presents the implementation, the ablation design, and the portfolio finding with a headline table. Places the technique alongside the adaptive encoding-selection heuristic in §5 as a complementary strategy when pattern recognition is not reliable. Updates N4 research plan to 'Executed' status with the headline statistics and the verdict (Outcome N: portfolio-like). Raw data: data/multi-encoding-results.tsv (216 measurements). Paper 1: 22 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Applied ~17-20 word-form replacements per paper via a conservative Python script (tmp/us-to-uk.py): - -ize -> -ise family: normalize, characterize, mechanize, generalize, optimize, recognize, minimize, maximize, formalize, factorize, organize, emphasize, utilize, hypothesize, specialize, initialize, standardize, realize, summarize. - -ization -> -isation (where applicable). - -yze -> -yse: analyze, catalyze. - -or -> -our: behavior, favor (present in 1 location). - -el -> -ell doubling: modeled -> modelled, labeled -> labelled (not present in our text but included for safety). - US catalog -> UK catalogue (1 location, already British from earlier edits). LaTeX environment and command contexts preserved (negative lookbehind for backslash ensures e.g. \color{...} in todo macros and \begin{itemize} are untouched). Technical CS terms kept: 'program' (software sense) stays in 'program verification'; 'size' stays as is. Also corrected the three 'Mechanized' (capitalised) missed by the initial pass. Paper 1: 22 pages, builds clean. Paper 2: 20 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors the senior-reviewer fix commit d4f06dc from multiplier-encodings.git. Same substantive changes: - Harmonise BVE story in §5.1 (three-regime breakdown). - Fix broken table/section references (Table 4 -> Table 10; Section 6 below -> Table tab:best). - Fix DRAT size for strength_chain_16 (10 MB -> 3.3 MB). - Remove companion-algebraic citations (Paper 2 not in submission state yet; do not cite). - Clarify multi-encoding T/O accounting. - Soften 'Proof-guided methodology' -> 'Proof-guided encoding design' with explicit scope. - Fold §6.5 Multi-Encoding into §5.2 Multi-Encoding as Hybrid Selection. - Remove stale 1728-configuration sweep. - Comment out §4 'Development effort' paragraph. - Add benchmark-pool intro to §7; add missing labels. Paper 1: 22 pages, builds clean. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors commit from multiplier-encodings.git. Body reduced from 19 pages to 16 pages by moving supplementary tables to the appendix and compressing prose in §3 (N2 variants and observations), §4 (automation paragraph), §6.1 (Booth), §7.10 (SMT-COMP), and §8 (Negative Results). No claims removed. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Same four fixes as in multiplier-encodings.git e775198: - Figure 1 Pass 1/Pass 2 labels moved below their arrows (were above, colliding with 'Carry bits:' and 'Result:'). - Adaptive-heuristic ablation table moved to a new appendix, with a compact summary retained in the body. - Conclusion + future-work condensed: single summary paragraph + single open-directions sentence, down from ~45 lines to ~25. (No duplicate subsection heading in this repo's version, so that fix is not mirrored.) Main body now ends on page 16-17 (21 pages total). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Same six fixes as multiplier-encodings commit 7xxx: 1. Reworded three Martin Brain 'private communication' references to 'earlier unpublished work by one of the authors' (Martin is now a co-author). 2. Redesigned Figure 1 with header-style section labels above each block group, inline 'sums'/'carries' labels, and Pass labels properly spaced above arrows. 3. Folded the one-sentence DRAT lead-in into the ablation paragraph. 4. Removed 'A methodological caveat: encoding flags must reach the solver' paragraph (internal engineering observation, no scientific value to this audience). 5. Merged three-paragraph §7 opening into a single paragraph. 6. Tightened paragraphs with 1-2 word orphan last lines. Main body ends on page 16; total 21 pages. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors multiplier-encodings.git commit: adds Wallace as an eighth encoding to the comparison. Key finding (Wallace does NOT track Dadda, differs by up to 14x on commutativity) applied to: - Contribution bullet (six- -> eight-encoding) - Section 2 Background (Wallace alongside Dadda) - Figure 2 scaling plot (seven encodings including Wallace) - Appendix D tab:proof-sizes (eight encodings including Wallace) Same data: - Wallace proof sizes on comm_11/13 are 94 MB / T/O (vs Dadda 6.5 MB / 98.6 MB) - dramatically worse - Wallace scaling times 18.6x slower than Dadda at BW=11 - Wallace tracks Dadda on non-commutativity benchmarks (mul_ineq, overflow, strength reduction: within 2x) Paper still 21 pages; body ends page 16. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors multiplier-encodings.git commit to Paper 1 plus related Paper 2 update. Paper 1 (doc/paper-bitblasting/paper.tex): Same 10 review items (M1-M10) plus cut/compress/polish as the multiplier-encodings commit. Body reduced from 19 to 14 pages. Paper 2 (doc/paper-algebraic/paper.tex): - Replace the short 'two classes' taxonomy (universal/equational vs existential) with the full 4-class taxonomy (universal vs existential, crossed with equational vs relational) that moved here from Paper 1. The full taxonomy is a more natural fit for Paper 2 since the algebraic techniques cover the universal equational quadrant specifically. - Remove duplicate 'bit-blasting companion paper' citation that now only needs to appear once. New SMT-COMP isolation data files: - doc/paper-bitblasting/data/smt-comp-combacs-nosimp.tsv (66 benchmarks, smt2_solver, combacs encoding, no simplify/algebraic) - doc/paper-bitblasting/data/smt-comp-shiftadd-nosimp.tsv (same, shift-add encoding) Both configurations solve 41/66 with identical PAR-2 (52.1 s); per-benchmark ratio on both-solved between 0.83x and 1.15x, confirming the encoding alone is neutral on a broad community sample. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors multiplier-encodings.git author-review tweaks 3-5 (the §2 grammar fix did not apply here because cbmc-github had different phrasing already) and applies the same M1 treatment to the intro that was applied to Paper 1 in the previous commit on the other repo. - Remove the shorter 2-class taxonomy from cbmc-github intro to match multiplier-encodings (full 4-class taxonomy now lives in Paper 2 intro; a single-sentence segue remains in Paper 1 §10). - 'An influential line of work' -> 'Beame and Liew'. - Fix double-hyphen ligature in \texttt{--refine-arithmetic}. - Spell out Brent-Kung [BK] at first use. Both papers verified: --refine-arithmetic renders as two hyphens in the output, Brent-Kung is spelled out in §8.2 (not only in Appendix A). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Mirrors multiplier-encodings.git: adds Burch 1991 (early BDD-based multiplier verification), Kumar et al. 2023 (BDD with partial-product variables, O(n^3) size up to 1024 bits), and Dutertre 2020 (empirical evaluation of 16 SAT solvers on QF_BV CNFs finding CaDiCaL superior). Same integration points: Burch and Kumar in §9 'Algebraic verification'; Dutertre in §7.3 'Cross-Solver Analysis'. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Same fix as multiplier-encodings.git: restore the five per-row mechanistic explanations of which encoding wins on which pattern, plus the adaptive-selection consequence. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Captures the research-plan status (N1/N2/N3/N4), paper status, and recommended prioritisation after the senior-review passes and the Wallace integration. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…ommutativity Implements the first prototype for research plan N3 (Beame-Liew critical-strip construction). What works: - bench-multiplication/n3-beame-liew/generate_array_mul_comm.py: self-contained DIMACS generator for a*b = b*a on an n-bit array multiplier. Validated UNSAT at n = 1..11 by CaDiCaL and minisat. - bench-multiplication/n3-beame-liew/beame_liew_phase1_v2.py: emits a DRAT refutation by (i) 2^(2n) leaf 'branch-cut' RUP lemmas (one per (a, b) assignment), (ii) a binary-resolution tree over the 2n input variables, (iii) the empty clause. The proof validates with drat-trim at every tested n (1..11). - On the same CNFs, the prototype's DRAT proof is 4-10x smaller than CaDiCaL's DRAT output at n <= 8 and remains smaller at n = 9..10 (where CaDiCaL's baseline is itself incomplete at the 60s timeout). n BL DRAT CaDiCaL DRAT ratio 2 307 5,002 0.061x 4 9,987 41,560 0.240x 6 266,229 994,916 0.267x 8 6.1 MB 26.4 MB 0.230x 10 127 MB 213 MB 0.596x What this is NOT: This is not the polynomial-size O(n^6 log n) critical-strip refutation from Beame & Liew 2019. The prototype is O(2^(2n) * poly(n)), dominated by the flat enumeration. Phase 2-3 of the N3 plan would replace the 2^(2n) leaves with a read-once branching program of size poly(n) exploiting the Delta = log(2n) strip decomposition. Even so, the prototype establishes that: (i) the Phase 1 infrastructure works and drat-trim accepts the output; (ii) a structured proof (enumerate first, resolve second) can already beat CDCL's proof size at small n by a constant factor, suggesting proof structure itself is a useful piece of guidance. Also updates research-plans/N3-beame-liew-implementation.md with a Status section summarising the above. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Extends the N3 work from the previous Phase 1 commit with a structural Phase 2 prototype, a BDD sanity check, and an honest writeup of why the polynomial Beame-Liew bound remains out of reach within the time budget of one session. New artefacts under bench-multiplication/n3-beame-liew/: - generate_array_mul_comm_meta.py: CNF generator that tags every variable with its role (input bit, partial product, accumulator, carry, output, diff). Required for any strip-level analysis. - beame_liew_phase2_v2.py: produces a column-structured DRAT refutation. For each output bit k in [0, 2n) the generator emits a Phase-1-style resolution tree terminating in the unit -diff[k], then UP on the diff-OR clause gives the empty clause. Validates with drat-trim at n = 2..7. The proof is strictly LARGER than Phase 1's flat enumeration by ~2n (empirically 5.6x at n=2 -> 15.8x at n=7), confirming that per-column decomposition alone cannot beat flat enumeration: each column sub-proof has a different suffix literal and cannot share the input-enumeration tree. - phase2_bdd_probe.py: builds BDDs for c[k] and d[k] using the dd Python package and confirms structural equality as a sanity check that the array multiplier commutativity CNF is correctly encoded. Reports per-bit BDD DAG sizes. Empirical BDD sizes match Bryant 1991: peak BDD DAG size 462 at n=6, 4,419 at n=8, 41,148 at n=10. A BDD-based DRAT translation cannot yield a polynomial proof either. Key technical findings, documented in the README and in the N3 plan's Status section: 1. Unit propagation alone cannot derive -diff[k] from partial-input strip refutations (UP does not resolve biconditionals). 2. Per-column DRAT lemmas cannot share the 4^n-leaf input enumeration because their suffix literals differ. 3. BDDs of multiplier outputs are exponential (Bryant 1991). 4. The polynomial Beame-Liew bound requires extension variables in the DRAT proof (the branching program branches on internal tableau variables and encodes shared structure via RAT steps, not RUP). Phase 3 work -- branching program construction + RAT translation via Krajicek's Prop. 2.6 -- remains open. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Two quick follow-up investigations after the Phase 2 prototype: Option B (trimmed-core comparison): - Extracted CaDiCaL's DRAT proof core using drat-trim -l - Compared against Phase 1's proof size - Finding: even after trimming, CaDiCaL's core is 2.5-4x larger than Phase 1 at n >= 5 (0.31x-0.40x ratio, stable across n) - Conclusion: Phase 1's advantage is structural (ordered enumeration + explicit resolution tree), not a CDCL-noise artefact that trimming could fix. - Raw data in data-options-ba.tsv; writeup in options-B-A.md. Option A (RAT extension-variable test): - Wrote rat_test.py: 4-clause UNSAT with a RAT-introduced extension variable x3 := x1 AND x2 used in deriving empty. - drat-trim reports s VERIFIED for the proof. - Confirms drat-trim accepts RAT-format clauses introducing extension variables, so the toolchain supports the proof format needed for a future polynomial Phase 3 construction. - (The trimmed report shows '0 RAT lemmas in core' because RUP derivations succeed for these particular lemmas on a small formula; RUP is a special case of RAT, so this is expected drat-trim behaviour.) Together these clarify (a) that Phase 1's empirical win is robust to trimming, and (b) that the proof format permits the extension-variable encoding that a genuine polynomial Phase 3 proof would need. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Downloaded the 29-page arxiv v2 of Beame-Liew 'Towards Verifying Nonlinear Integer Arithmetic' (1705.04302v2) which contains the full critical-strip construction (§3, §4) referenced in the N3 research plan. Needed for Phase 3 implementation. Key construction details (§3.3): - The refutation is a read-once branching program (Proposition 2.1 translates BP to ordered resolution directly). - For each k in [0, 2n], the BP branches on the 'first-disagreeing' output bit being bit k. - Each strip phi_Strip(k) is the sub-formula with tableau variables in columns [k-Delta, k] where Delta = log n. - The per-strip BP branches on inputs row-by-row, merges on 'Cut(j)' states of size O(log k). - Size bound: O(k^5 log k) per strip (Corollary 3.3), O(k^6 log k) total. No RAT / extension variables are needed -- the BP branches on existing CNF variables, and merging at Cut(j) nodes gives the polynomial size. Option A's RAT-readiness is therefore not needed for the actual construction. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Implements extraction of phi_Strip(k) from the full commutativity CNF per Beame-Liew 2017 §3.3 Definition. A clause is in the strip iff it references a column-indexed variable (pp, acc, cry, or diff) whose column is in [k-Delta, k]. Plus: - tableau symmetry clauses pp_c[i][j] = pp_d[j][i] from the preprocessing step in the paper; - ZERO-constant unit clauses (without which the carry-chain starts from an unconstrained value and the strip becomes trivially SAT). Validation: at n=4, 6, 8 for k=1..2n-1, the extracted phi_Strip(k) is UNSAT (verified by CaDiCaL). This matches Beame-Liew's Lemma 3.1. Note: early attempts omitted (a) non-tableau carry/acc clauses and (b) ZERO-constant units, producing SAT strips. Both were bugs in the extraction, not in the paper. Step 1 complete. Step 2 (per-strip BP construction with Cut(j) merging) is the next milestone. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…nential Built BDDs for all strip variables using the dd package to measure the BP size we'd get if the BP branched in input-variable order. Results confirm that naive BDD-based BPs are exponential for middle-k strips, matching Bryant 1991: n=4, k=5: peak BDD 55 n=6, k=7: peak BDD 462 n=8, k=9: peak BDD 3,537 n=8, k=11: peak BDD 4,419 The Beame-Liew O(n^6 log n) polynomial bound relies on a SPECIFIC variable ordering (outputs o^yx first, then tableau row-by-row, with merging at Cut(j) states). Our input-variable ordering does not achieve polynomial size. Files: - phase3_bdd_size.py: measures BDD sizes per strip variable. - options-B-A.md updated with the strip BDD-size table and an honest explanation of why Step 2 requires the paper's exact variable ordering (not yet implemented). The polynomial construction remains the open Phase 3 work. phase3_bp.py was removed (it contained scaffolding for the wrong approach; will rebuild when implementing the paper's BP order). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Rather than hand-implementing Beame-Liew's exact BP construction (which requires weeks of careful work on the Cut(j) merging structure), prototype the strip decomposition by: 1. Extract phi_Strip(k) for each k in [1, 2n-1]. 2. Run CaDiCaL on each strip to get a DRAT refutation. 3. Compose via top-level case split on k. Empirical result (bench-multiplication/n3-beame-liew/data-phase3-perstrip.tsv): n PerStrip Phase 1 CaDiCaL full PS/P1 PS/CaD 4 137 KB 10 KB 42 KB 13.68x 3.29x 6 1.2 MB 266 KB 995 KB 4.58x 1.23x 7 4.6 MB 1.3 MB 5.2 MB 3.56x 0.88x 8 22 MB 6.1 MB 26 MB 3.55x 0.82x Per-strip CaDiCaL beats full-CNF CaDiCaL at n >= 7 (0.82x at n=8, ratio improves with n). So structurally decomposing the proof into strips helps a SAT solver, but per-strip is still strictly worse than Phase 1's ordered case-analysis enumeration (3.55x larger at n=7, 8). Obtaining Beame-Liew's O(n^6 log n) polynomial would require hand-implementing the BP with the paper's specific variable ordering and Cut(j) merging. That is documented as future work in options-B-A.md; not implemented in this session. Summary update in options-B-A.md enumerates what's committed across Phase 1/2/3 and what remains. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…nance Following Armin's review (widen corpus before revisiting remaining work), acquired the full SMT-LIB 2024 QF_BV release (Zenodo 10.5281/zenodo.11061097): 46,191 benchmarks, of which 15,435 contain bvmul and are declared unsat. Headline finding: triaging the full corpus surfaced the 'float' family (FP-as-BV, Haller-Griggio-Brain-Kroening FMCAD 2012), absent from our 66-benchmark sample, where our algebraic solver DOMINATES due to wide-multiplier structure (53x53->106-bit significand products): On 75 float bvmul-unsat benchmarks @ 60s (low-contention): ours 68 solved Bitwuzla 51 solved cvc5 6 solved 18 CONFIRMED unique wins (standalone: neither Bitwuzla nor cvc5 solves within 60s; all declared unsat; our verdict agrees with Bitwuzla everywhere both solve -> sound). This adds 18 to the paper's previous 4 wins-beyond-all-solvers. Methodological note recorded in RESULTS.md: a first pass at 36-way parallelism (= core count) manufactured spurious timeouts via contention (newton.3.3.i 'won' at 36-way but Bitwuzla solves it standalone in 31s). All win claims re-confirmed at 1-6 way parallelism. Evaluation should run at <= cores/4 way. Committed (repo kept lean; full 35GB corpus stays in staging): bench-multiplication/float-fp2bv/ pow5.smt2, mul_03_30_4.smt2 -- two smallest clean wins MANIFEST.tsv -- sha256+size for all 75 (re-fetch) results-{ours,bitwuzla,cvc5}.tsv -- full per-benchmark evidence run-any.sh, run-corpus.sh -- parallel evaluation harness RESULTS.md -- writeup + reproduction steps Also updated doc/paper-algebraic/benchmark-sources.md with the acquisition status and the AIG->SMT-LIB converter blocker for the Tier B Konrad gate-level archives. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The float-family discovery reshapes the priority list: - Add Item 11 'Exploit the float (FP-as-BV) family' as NEW HIGH PRIORITY. 18 unique wins-beyond-all-solvers on a canonical third-party benchmark family (quadruples headline wins 4->22). Four sub-tasks: adopt into eval corpus, diagnose 7 misses, write into paper, probe rest of FP. Sub-tasks 1+3 need no new code (data already collected). - Rewrite Recommended sequencing into four tiers: Tier 1 (now): Item 11.1+11.3 -- adopt float + write paper result. ~2.5 days, biggest headline gain, zero new code. Tier 2: Item 11.2 (diagnose misses) + Item 10 Phase 1 (fragment characterisation) + Item 3 (Sage2 scalability, now better quantified: 24 unique fails in stratified run). Tier 3: Item 10 Phase 2 gate extensions (SBIF, vanishing removal), prioritised by the miss diagnosis. Tier 4: Item 7 gate-level (Konrad AIGs downloaded, awaiting an AIG->SMT-LIB converter). - Update top-of-file status with the +18 float wins. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

A co-author asked whether FP operations actually traverse our algebraic path (we use float_utilst, not float_bvt). Checking this disproved the previous commit's float-family claim: 1. The float benchmarks are PURE QF_BV -- no Float sort, no fp.* operators (the fp.iN tokens are just variable names declared (_ BitVec N)). So no FP lowering runs; float_utilst/float_bvt is irrelevant here. 2. Ablation (DISABLE_ALGEBRAIC=1) shows the algebraic path is NOT responsible for the float 'wins' -- and is net overhead: pow5 1.28s ON == 1.28s OFF test_v7_..s703 31.2s ON -> 15.3s OFF (FASTER without algebra) newton.3.3 8.3s ON -> 6.7s OFF All still solve unsat with algebra OFF via bit-blasting + MiniSAT (pow5: 171 SAT conflicts). The wins over Bitwuzla/cvc5 are a SAT-backend artifact, not evidence for the algebraic method. Corrections: - RESULTS.md: prominent CORRECTION block + retraction note on the headline. Data retained for the record. - benchmark-sources.md: headline corrected. - remaining-work.md: top status corrected (count stays 4, not 22); Item 11 marked RETRACTED with the ablation evidence. New Item 12 (HIGH PRIORITY, Tier 0): ablation re-audit of ALL claimed algebraic wins. The float episode showed we lacked ablation discipline -- a 'win' only supports the thesis if it survives DISABLE_ALGEBRAIC=1. Demonstrated the audit distinguishes genuine from artifact: SABER n16-bw13: 0.02s ON vs 60s TIMEOUT OFF -> GENUINE assoc_64: 0.01s ON vs 0.01s OFF -> ARTIFACT Recommended-sequencing reordered: Item 12 is now Tier 0, gating credibility of all headline numbers before submission. The corpus widening itself (full QF_BV acquired + triaged, harness built) remains valid; only the float attribution is retracted. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…S BUGS Ran every paper claim algebra-ON vs DISABLE_ALGEBRAIC=1, single- threaded, cross-checked against z3/Bitwuzla/declared-status. GENUINE (survive ablation, sound): - 4 Brain wins-beyond-all-solvers: NEEDS_ALGEBRA, sound - 5 SMT-COMP unlocks (cohencu_0..3, geo3.c_5): NEEDS_ALGEBRA - 41/66 = 31 bit-blast + 10 algebra (restate in paper as 10/31) - SABER: 26/31 NEEDS_ALGEBRA + 1 ALGEBRA_FASTER (cleanest) - DSP mac_equiv: ALGEBRA_FASTER (0.14s vs 45.8s) No unsoundness in the UNSAT direction. CRITICAL FINDING - 2 soundness bugs (algebra says SUCCESSFUL, bit-blast+z3 say FAILED; the dangerous direction): - assoc.c (BW=9): assertion operands cast to signedbv[32] (C integer promotion); extractor reasons in 9-bit ring, ignores the cast, proves a false-at-32-bit identity. - mul_zero_factor.c (BW=8): {a*b=0, a!=0, b!=0} is SAT over Z/2^8 (zero divisors: 16*16=0) but algebra refutes it (assumes integral-domain/field; no zero divisors). div_roundtrip = ARTIFACT (bit-blast also fast). Added Item 13 (CRITICAL, blocks submission) with fix directions for both bugs + verification plan + Lean reconciliation note. Item 13 is now Tier 0 ahead of everything. Custom-suite '39/39 validates correctness' claim must be withdrawn. Artifacts: doc/paper-algebraic/ablation-audit-2026-06-01.md; bench-multiplication/ablation-audit/{smtcomp66,brain-wins,saber, custom-c}.tsv + ablation{,-cbmc}.sh harness. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The Item 12 ablation audit found two cases where the algebraic pre-solver reports VERIFICATION SUCCESSFUL when the correct answer (per bit-blasting and z3) is FAILED -- claiming no bug where a bug exists. Both are fixed by conservative guards in boolbvt::try_algebraic_solve that fall back to bit-blasting on the unsound patterns; genuine algebraic wins are unaffected. Bug A -- widening cast of a defined intermediate (assoc.c). C integer promotion encodes `(a*b)*c == a*(b*c)` on 9-bit operands as cast(ab,signedbv[32])*cast(c,signedbv[32]) == ..., where ab is the SSA symbol defined by `ab = a*b` truncated to 9 bits. The single-ring extractor pinned the ring at 9 bits (from the defining equations) and silently absorbed the 32-bit cast, treating ab as the exact 9-bit product inside a 32-bit multiply -- so it "proved" the (9-bit-only) associativity identity. Fix: drop any (dis)equality in which a widening typecast of a defined-intermediate symbol is a DIRECT operand of arithmetic (* + - unary-). This is precise: widening a primary INPUT (mul_overflow's cast(a,32)*cast(b,32)) is a faithful zero/sign extension and is kept; a widen-then-narrow (matrix_mul's cast(cast(c00,32),8)) is not an arithmetic operand and is kept; SMT-LIB benchmarks carry no ID_typecast and are untouched. Bug B -- zero divisors over ZMod(2^d) (mul_zero_factor.c). A disequality x!=0 is refuted via the Rabinowitsch trick x*e-1=0, which asserts x is a UNIT. Over a field that equals x!=0, but over ZMod(2^d) a non-zero element need not be invertible, so {a*b=0, a!=0, b!=0} (SAT via 16*16=0 mod 256) was wrongly refuted. Fix: detect the zero-divisor shape -- a product (directly, or via an SSA symbol definition) constrained to 0 whose two factors are each separately constrained non-zero -- and skip algebraic solving. Genuinely-unsat refutations (e.g. cohencu) carry no such pattern. Verification (ablation re-audit, algebra ON vs DISABLE_ALGEBRAIC=1): - assoc.c (BW=9) and mul_zero_factor.c (BW=8): now FAILED, matching bit-blast + z3. - No regression: SMT-COMP 66 identical (31 ARTIFACT / 25 BOTH_TO / 10 NEEDS_ALGEBRA, all 10 unlocks preserved); SABER identical (26 NEEDS_ALGEBRA + 1 ALGEBRA_FASTER); 4 Brain wins still NEEDS_ALGEBRA; custom C wins (mul_overflow, comm_64, mac_equiv, matrix_mul, hash_mul, distrib, mul_square) still SUCCESSFUL and fast (algebra preserved). - groebner unit tests 11/11 pass; smt2_solver regression passes. Added regression tests (expected FAILED) so neither bug can silently return: regression/cbmc/algebraic-soundness-assoc/ regression/cbmc/algebraic-soundness-zero-divisor/ Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

… 14) After the two targeted Item 13 fixes (assoc, mul_zero) were committed, a broad scan ran the solver (algebra ON) on 3,677 SAT bvmul benchmarks from the full QF_BV corpus and flagged any returned `unsat` (the dangerous direction). Result: 76 wrong `unsat`. Splitting by DISABLE_ALGEBRAIC=1: - 41 ALGEBRA-CAUSED (Sage2 x40, sage x1): genuine unsoundness in our algebraic pre-solver, NOT fixed by the two targeted guards. - 35 PRE-EXISTING (all float): wrong `unsat` even with algebra OFF -> a pre-existing CBMC bit-blaster/SMT2 front-end issue, outside the algebraic pre-solver; flagged separately. Root cause is systemic: the Rabinowitsch unit-trick (diff != 0 |-> diff*e-1 = 0, "diff is a unit") is unsound over ZMod(2^d) since non-zero != unit, so UNSAT_rabinowitsch does not imply UNSAT_original. Used at >=3 sites (__rabinowitsch, __rab, __rab_disj), plus a second distinct mechanism (is_zero()/gb-on- equalities) accounts for 4 residual that stay wrong even with all Rabinowitsch sites gated. Feasibility (experimental DISABLE_RABINOWITSCH gate, reverted): gating the 3 Rabinowitsch sites fixes 37/41; SABER + 4 Brain wins survive (sound vanishing path); but cohencu_0..3 + geo3.c_5 regress to timeout (they depend on Rabinowitsch); 4 residual remain. So a naive sound-only mode regains soundness at the cost of ~5 SMT-COMP unlocks and still needs the second mechanism fixed. Docs: - ablation-audit-2026-06-01.md: ADDENDUM with the full scan, root-cause analysis, feasibility data, design options. - remaining-work.md: Item 13 -> PARTIAL (down payment); new Item 14 (CRITICAL, blocks submission) with the systemic analysis, three design options (sound-only ideal-membership; unit-aware Rabinowitsch; Lean reconciliation), and Tier-0 re-prioritisation. - bench-multiplication/ablation-audit/sat-scan-*.txt: the 41 algebra-unsound and 35 pre-existing benchmark lists. The two committed targeted fixes remain correct and tested but are only a down payment. The algebraic pre-solver is unsound by default over ZMod(2^d) on ~41 real benchmarks; the paper's soundness claim cannot stand until Item 14 is resolved. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Answers the concern: we have a Lean soundness development yet the audit found unsoundness. The proofs are correct; both bugs live OUTSIDE the verified boundary at the encoding seam: - Bug A: the Lean model BVExpr is single-width; cast is same-width identity; the model cannot represent a width-changing cast. The C++ documented this (ASSUMES: casts do not change outer width) but the 'MAINTAINED BY' predicate (set_bitwidth ) did the opposite -- it absorbed wider casts into the narrow ring. Documented assumption, not enforced. - Bug B: the Rabinowitsch disequality encoding (diff!=0 |-> diff*e-1=0) has NO // PROOF: annotation, NO TRACEABILITY row, and is NOT in the BVExpr model (no disequality constructor / no e-variable). It is false over ZMod(2^d) (asserts diff is a unit). GroebnerSoundness.lean even contains two_not_isUnit but filed it under 'incompleteness', never stating the soundness bridge that would have required an IsUnit hypothesis. Why traceability missed it: the bi-directional audit checked that every // PROOF: ref resolves and every row maps to code -- but only over the ANNOTATED set. An unannotated soundness-critical step is invisible to both directions. Item 14 fix will also close the gap: (1) mechanise the disequality-encoding soundness obligation (which fails to prove the unconditional Rabinowitsch step -> forces ideal-membership / IsUnit side-condition); (2) enforce the single-width assumption in code; (3) add a coverage check so every soundness-critical step must be annotated (fail CI otherwise). doc/paper-algebraic/proof-gap-analysis-2026-06-01.md Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…tsch) The algebraic pre-solver wrongly reported unsat on 41 real SMT-LIB SAT benchmarks (broad-scan finding behind Item 13). Root cause: the Rabinowitsch unit-trick is unsound over ZMod(2^d). Encoding diff != 0 as diff*e - 1 = 0 asserts diff is a UNIT; over ZMod(2^d) a non-zero element need not be invertible (e.g. 2 mod 4), so UNSAT of the Rabinowitsch system does NOT imply UNSAT of the original disequality query. Diagnosis showed 38/41 via the three Rabinowitsch sites and a further 3 (Sage2/bench_{12880,13209,5552}) via a tree-walk leaf equality whose bit-decomposition operators (bitand/lshr) overconstrain the equality system F. Fix (src/solvers/flattening/boolbv.cpp::try_algebraic_solve): - Remove the Rabinowitsch unit-trick at all three sites (the __rabinowitsch push to the combined system, the single __rab per-disequality pass, and the disjunctive __rab_disj pass). - Promote the sound ideal-membership refutation to default-on: after Buchberger, reduce each diff = lhs-rhs modulo the Groebner basis of the equalities F; remainder 0 => diff in <F> => diff = 0 on every common zero of F => the disequality is UNSAT. Sound over any commutative ring (no unit assumption). The per-disequality vanishing test is retained (also sound). - Add a contains_bit_decomp filter so tree-walk leaf equalities containing bitand/bitor/bitxor/lshr/ashr/bitnot are not fed into F (sound: underconstraining F never turns SAT into UNSAT). - Annotate every UNSAT-conclusion site with a // PROOF: reference. Proof gap closed: - formal-proofs/DisequalityRefutation.lean (zero sorry): diseq_refutation_sound / diseq_refutation_unsat (ideal membership => sound refutation, any commutative ring) and rabinowitsch_unsound_over_zmod (concrete Z_4 witness: 2 != 0 yet no e with 2e = 1), plus ideal_membership_declines_witness. - scripts/check_proof_traceability.py: fails if any // PROOF: reference does not resolve to a real Lean theorem, or if any UNSAT conclusion in try_algebraic_solve lacks a // PROOF: annotation. On first run it caught four pre-existing DRIFTED references (renamed theorems / wrong file) that the manual audit had missed; those are corrected here (smt2_parser, poly_ring, poly_extract, boolbv). - TRACEABILITY.md rows added. Verification: - Broad SAT scan (3,677 SAT bvmul benchmarks): 0 non-float wrong-unsat (was 41). The 35 float wrong-unsat are pre-existing (reproduce with DISABLE_ALGEBRAIC=1) -- separate CBMC bit-blaster/SMT2 issue. - SABER 4/4 and Brain 4/4 preserved; cohencu_0/1 preserved via the sound ideal-membership path (Bitwuzla confirms unsat). - Lost: cohencu_2, cohencu_3, geo3.c_5 (3 SMT-COMP unlocks) -- the accepted soundness cost. SMT-COMP NEEDS_ALGEBRA 10 -> 6. - groebner unit tests 11/11; assoc/mul_zero still FAILED; new regression test regression/smt2_solver/algebraic-rabinowitsch- soundness passes; coverage check passes; build clean, no warnings. Outstanding (Item 14b, docs): the paper still presents and claims soundness for the Rabinowitsch trick (paper.tex Theorem proof sketch, contributions table, disequality prose) and must be rewritten. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…er rewrite Traced where the (now-removed) Rabinowitsch trick came from, to inform the Item 14b paper rewrite. It was not invented here: it is the classical Nullstellensatz device (f != 0 |-> exists y. y*f = 1), adopted directly from the explicit future-work suggestion of the paper we build on, Song et al. 2024 (Equational Bit-Vector Solving via Strong Groebner Bases), Section 7: "extend our methods to more bit-vector arithmetics, such as bitwise operations and inequalities by, e.g., the Rabinowitsch trick [19, Ch. 4.2, Prop. 8], [29]." Both references they cite are FIELD settings where the trick is sound: [19] Cox-Little-O'Shea, Ideals, Varieties, and Algorithms, Ch. 4 (Nullstellensatz over an algebraically closed field); [29] Hader-Kaufmann-Kovacs, SMT solving over finite field arithmetic. Consequence for the paper: this is a scholarly CORRECTION, not just a disclaimer. We are the first to implement that suggested extension in the finite-RING setting ZMod(2^d), and we show it is unsound there (ZMod(2^d) is not a field; the trick encodes "f is a unit", and a nonzero non-unit such as 2 in Z_4 breaks it -- see DisequalityRefutation.lean::rabinowitsch_unsound_over_zmod). The cited field results remain correct; we caution against importing them to finite rings, and use the sound ideal-membership / vanishing refutation that the SCA multiplier-verification literature uses. - Added references: cox2015ideals, hader2023finitefield. - remaining-work.md Item 14b: provenance + recommended framing. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…e width bug Replaces the incomplete ideal-membership-only disequality handling with Song et al.'s ring-aware encoding z*(a-b) - 2^{d-1} = 0, which is equisatisfiable with a != b over ZMod(2^d) (Song et al. Prop. 5/6) and hence sound AND complete for the refutation (UNSAT) direction. Added at the single- and disjunctive-disequality sites in try_algebraic_solve; the vanishing test remains as a fast sound pre-check. Root-caused and fixed a PRE-EXISTING soundness bug exposed by the encoding (and also latent in the previous ideal-membership path): extract_predicate could emit the spurious constant `1` (i.e. 1 = 0) for a width-inconsistent signed comparison. A signed `0 <s x` is canonicalised to `x >=s 1`, sign-XOR-transformed into a lower bound whose constant (2^31+1) was computed at width 32 while the ring width was d=16; 2^31+1 >> 2^16 tripped the "bound impossible" branch, poisoning F so the main Groebner check reported a spurious UNSAT. Fix: a faithful d-bit constant always satisfies C < 2^d, so if C >= 2^d the operands are width-inconsistent and we return nullopt (defer to bit-blasting) instead of emitting an unsound constant. Never fires on well-formed predicates (no completeness loss). Completeness improvement: has_constant now refutes on ANY non-zero constant, not only odd/unit ones. A constant evaluates to itself under every assignment but every ideal element must evaluate to 0, so a non-zero constant (even an even one) admits no solution. This is needed for completeness of the 2^{d-1} encoding, whose refutation witness may be even. Lean (formal-proofs/DisequalityRefutation.lean, zero sorry): - nonzero_constant_no_solution: any non-zero constant in the ideal => no solution (generalises the odd/unit case; backs has_constant). - song_encoding_equisat: Song Prop. 5/6, u != 0 iff exists z, z*u = 2^{d-1}, with the forward direction proved by the 2-adic valuation argument (u = 2^a * odd, a < d) reusing the odd=>unit fact. Verification: - Broad SAT scan (3,677 SAT bvmul benchmarks): 0 non-float wrong-unsat with full Song enabled. The 35 float wrong-unsat are pre-existing (reproduce with DISABLE_ALGEBRAIC=1), a separate bit-blaster issue. - SABER, Brain, and cohencu_0/1 (Bitwuzla-confirmed unsat) preserved. cohencu_2/3 + geo3.c_5 remain undecided (time out) -- they are unverifiable (z3/cvc5/Bitwuzla all time out at 300s; declared unknown) and their old "unsat" came from the unsound unit-trick, so not claiming them is correct. - groebner unit tests 11/11 (the even-constant case now correctly UNSAT); cbmc + smt2_solver regression tests pass; coverage check passes; build clean, no warnings. Soundness is established; this is a sound UNSAT pre-solver (not a standalone decision procedure) -- non-refuted queries still defer to bit-blasting, which provides overall completeness for QF_BV. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

3-assert ddmin minimisation (from Sage2/bench_2155, declared sat) of the spurious-constant bug fixed in 2ff48a05bf, with a README explaining the root cause and fix. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…nowitsch framing) Updates the algebraic paper to reflect the implemented soundness fix (Item 14). The textbook field Rabinowitsch trick is unsound over ZMod(2^d); we adopt Song et al.'s ring-aware z*(a-b) - 2^{d-1} = 0 encoding, which is equisatisfiable with a != b (Song Prop. 5/6). - Replaced the false "Soundness" theorem/proof (which claimed the unit-trick sound) with Theorem (Song Prop. 5/6): u != 0 iff exists z, z*u = 2^{d-1}, plus the non-zero-constant refutation criterion, citing the mechanised DisequalityRefutation.lean theorems. - Added a remark documenting that the textbook Rabinowitsch trick is field-only (sound over fields per cox2015ideals / hader2023finitefield / Ozdemir23cvc5; unsound over the finite ring ZMod(2^d)), and that a prototype using it was unsound on real benchmarks. - Contributions table + list, the Buchberger algorithm pseudocode (now refutes on ANY non-zero constant, not only odd), the two worked examples (commutativity, cohencu), the evaluation prose, the Lean soundness-contributions item, and the conclusion all updated from "Rabinowitsch" to Song's 2^{d-1} encoding. - Related work: new "Scaling to large bitwidths" paragraph citing Hofstadler et al. (multimodular homomorphic images) and the SCA multiplier line (kaufmann2019, ritirc2017/2018), with an honest limitation: multimodular CRT does not transfer to native ZMod(2^d) reasoning (prime-power modulus; Boolean-gate model equivalence). - Bib: added cox2015ideals, hader2023finitefield (earlier), and kaufmann2019, ritirc2017, ritirc2018, hofstadler2026. - Corrected the now-stale Lean theorem count (135 -> over 150) to reflect the added DisequalityRefutation theorems. Paper compiles cleanly (no undefined refs/citations). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…w=16) The cohencu-style-quadratic smt2_solver test expected UNSAT, but at bw=32 that UNSAT was only reachable via the textbook Rabinowitsch unit-trick removed in Item 14 (unsound over Z_{2^d}). With Song et al.'s sound z*(a-b) - 2^{d-1} encoding the refutation no longer completes within the regression budget at bw=32 (a completeness limitation relative to the earlier unsound shortcut), so the test hung. The query is genuinely UNSAT at any bitwidth (substituting the two constraints makes (z*z+12) - (6z+12y) vanish identically). Reducing to bw=16 keeps deterministic, oracle-verified (z3 + cvc5 both UNSAT) coverage of the F4 tail-reduction / non-leading-term path while remaining within budget. Comment updated to describe Song's encoding and the bw=32 limitation. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…qual() Commit d06d50c678 added redundant clauses (eq[i] OR eq[i+1]) for 10-13 bit equality checks, on the premise that "if bit i differs, the adjacent bit must be equal". That premise is false: two adjacent bits can both differ (e.g. op0=0b00, op1=0b11 has eq[0]=eq[1]=false), so the clause over-constrains the SAT instance and makes satisfiable formulas containing 10-13 bit equality checks spuriously UNSAT. Found via a corpus soundness sweep: 98/98 declared-sat mcm/ benchmarks and a family of float/ benchmarks returned unsat where z3, cvc5 and upstream all return sat; git bisect identified d06d50c678 as the first bad commit. Removing the clause flips all of them back to sat. Regression test distilled (via ddsmt) from mcm/06. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The assert command defers all assertions into deferred_assertions, which are only encoded (solver.set_to_true) at solving time. check-sat flushes them, but check-sat-assuming never did, so the entire regular assertion stack was silently ignored under check-sat-assuming: e.g. (assert false) (check-sat-assuming ()) wrongly returned sat. Flush deferred_assertions in check-sat-assuming exactly as check-sat does, before adding the temporary assumptions. Covered by the existing regression test basic-bv1/check-sat-assuming1 (previously failing). Minimal reproducer distilled via ddsmt. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…nctions Deferred bit-blasting collects assertions and replays them in finish_eager_conversion() when the algebraic pre-solver does not refute. The replay happened after functions.finish_eager_conversion(), which generates the congruence axioms for uninterpreted-function applications from the set of applications seen so far. A term occurring only inside a deferred assertion (e.g. m(zero_extend(x))) was therefore registered too late to receive its congruence axioms, dropping constraints and yielding spurious sat. For example: (assert (= #x00000001 (m #x00000000))) (assert (= #x00000000 ((_ zero_extend 31) x))) ; forces x = 0 (assert (= #x00000000 (m ((_ zero_extend 31) x)))) is unsat (m(0) = 1 and, by congruence, m(zero_extend(x)) = m(0) = 0) but was reported sat. Move the replay before functions.finish_eager_conversion() (array axioms in SUB::finish_eager_conversion already run after the replay, so they were unaffected). The algebraic solve still runs first so a successful refutation still short-circuits the replay. This fixes the bug at the source; DISABLE_DEFER_BITBLAST=1 is no longer needed. Regression distilled via ddsmt from uninterpreted-functions/uf1. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…type The Level-3 candidate-assignment hint in try_algebraic_solve rebuilt each variable's symbol as unsignedbv_typet(bw), where bw is the ring bitwidth of the Groebner extractor. For a variable whose real width differs from bw this created a width-inconsistent entry in boolbv_map: the subsequent normal conversion of an assertion mentioning that symbol then tripped the get_literals invariant (literal_map.size() == width) and aborted. This crashed cbmc (SIGABRT) on Initialization6 and havoc_slice/test_struct_* whenever the algebraic layer was active. The hint is a soft, gate-retractable guidance clause, so it is sound to reconstruct the symbol with its real type (collected from the typed equation expressions) and to skip any variable whose width does not match the ring width. Now no width-inconsistent map entry is ever created. Verified: Initialization6 and havoc_slice/test_struct_{a,c,d} run to completion again with the algebraic layer enabled; algebraic-soundness tests and the full smt2_solver suite still pass. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…ttribution Adds soundness-sweep-findings-2026-06-03.md documenting the corpus-scale differential sweep, the four bugs it found (with ddsmt-minimised reproducers and git-bisect root causes), and the fixes (8a6f7b2ef1, 9c937299ab, c1ceecbdd0, 7fab6c5d8f). Key correction: the float/mcm wrong-unsat results were previously described in ablation-audit-2026-06-01.md as "pre-existing CBMC bit-blaster" issues. A merge-base baseline build returns the correct sat, and git bisect attributes them to branch commit d06d50c678 (an unsound adjacent-equality SAT-encoding optimisation), now fixed. Added correction notes at both spots in the audit doc. The algebraic pre-solver itself remains sound: 0 wrong answers attributable to it across the full bvmul sat corpus (4,361) plus a 4,000-file cross-corpus sample. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…suite + per-tweak ablation Re-ran the evaluation with the now-sound binary (after the four soundness/robustness fixes) and refreshed paper.tex: - Random-polynomial: replaced the 210-benchmark random sample (not reproducible -- the harness used shuf) with the ENTIRE suite, all 3498 benchmarks across both seeds and eight categories. Solved counts: shift-add 1972, Comba/CS 1970, algebraic 2128, Bitwuzla 3021, cvc5 2678 (union 3104). Wins beyond all solvers: 59 (was 4 on the sample); over bit-blasting: 257 (was 18). Per-category table updated to full-suite denominators. Removed the stale per-benchmark timing table (and the 0.03-0.20 vs 0.3-3.7s discrepancy). Soundness: 0 sat-vs-unsat disagreements and 0 verdicts contradicting declared :status across all 3498 x 5 configurations. - SMT-COMP sample (66): refreshed to shift-add 37/66 (PAR-2 57.1) and algebraic 43/66 (PAR-2 44.8); 0 disagreements; still overtakes cvc5 and unlocks 8 benchmarks bit-blasting cannot decide. - Added a leave-one-out per-tweak ablation (Table tab:tweak-ablation) on a 26-benchmark cross-suite subset: Tseitin propagation (4 wienand), the vanishing test (3), and the boolean tree walk (geo3.c_5) are load-bearing for solvability; the finer tweaks (F4 interreduction, deferred bit-blasting, if-case elimination, nonzero fast-path, GB ordering) affect only running time. - SABER 31/31, custom 39/39, DSP 5/5 re-confirmed (0 disagreements). Raw results added under doc/paper-algebraic/data/. Paper compiles cleanly (46 pages, no undefined refs). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…tive, validation methodology) Reconciles the paper with the post-fix code and resolves reviewer-flagged inconsistencies: - Stale numbers: intro and conclusion "4 wins" -> 59 (full-suite result); conclusion "18 Lean modules" -> 20 (matches the status summary and the actual module count). - "No correctness regressions / 691 tests": corrected to the accurate CORE count (1167), states zero soundness/correctness regressions with the algebraic layers enabled, and notes the two remaining failures are non-algebraic timeouts (reproduce with the layer disabled). Added a validation-methodology paragraph (mechanised proof + corpus-scale differential sweep vs z3/cvc5/baseline) and disclosed that the sweep found and fixed integration bugs (the get_literals width abort). - odd/unit-constant -> any non-zero constant: swept the prose and Algorithm-1 narrative; reframed the StrongGB negative results (naive_completeness_is_false, F={C 2}) as the motivation for generalising has_constant from odd-only to any non-zero constant (sound by nonzero_constant_no_solution), under which F={C 2} is now decided UNSAT; residual incompleteness (function-vanishing, budget) retained. Updated StrongGB.lean header to match. - Defer.lean: documented the faithfulness ASSUMES (verdict depends on the committed set, not replay order) and that a real bug (UF-congruence replay ordering, found by differential testing, not by the proof) violated it; the fix restores it. Added as the third ASSUMES/MAINTAINED-BY story in the conclusion and proofs section. - Theorem 1 (vanishing): made domain width n vs codomain width m explicit and scoped the small-n case to the mixed-width remark. - SABER N=256: reconciled 11.25 s (Table saber-scaling) vs 13.4 s (Table saber-defer) as uninstrumented vs RSS-instrumented runs. - Clarified the 150 theorems / 65 mapping entries / 37 source entities denominators. Paper compiles cleanly (47 pages, no undefined refs). Lean edits are docstring-only. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

These are patch/merge backup files (*.orig) that were committed by accident; they are duplicates/leftovers of their live tracked counterparts (or have no counterpart) and should never have been tracked. Removing them; the live files are unaffected. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Prevents recurrence of the accidentally-committed build-cov/build-debug trees, *.gcov/*.gcno coverage output, and *.orig patch backups. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Updates REPRODUCE.md to match the refreshed paper: the full 3498-benchmark random-polynomial run (replacing the old 210 sample), the leave-one-out per-tweak ablation, the corpus differential soundness sweep, the refreshed SMT-COMP/SABER/custom numbers, the complete DISABLE_* ablation-variable table, and pointers to the new data/*.tsv files and the soundness-sweep findings doc. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…smoke vs z3) Wires two guards recommended in the paper: - proof-traceability lint: runs scripts/check_proof_traceability.py (no build) so every PROOF: reference resolves and every UNSAT conclusion in try_algebraic_solve stays annotated; - soundness smoke: builds smt2_solver and runs the algebraic regression queries (adjacent-equality, deferred-replay, rabinowitsch, cohencu-F4, check-sat-assuming), asserting each verdict matches the expected result AND agrees with z3 -- the wrong-verdict cross-check the standard output-matching regression harness does not provide. New script scripts/algebraic_soundness_smoke.sh (also runnable locally). Triggers on changes to the algebraic source, boolbv/bv_utils/smt2, the proofs, and the smt2_solver regressions. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

The deferred-bit-blasting replay in finish_eager_conversion called SUB::set_to directly, bypassing the boolbv_set_equality_to_true optimisation that the eager (non-deferred) tail of set_to applies to equalities. For queries with large equalities and no algebraic content (e.g. union_large_array's 100k-element array union) this fully bit-blasted the equality on replay, producing a far harder SAT instance than the eager path and timing out (>300s vs <1s eager). git bisect identified the deferred-bit-blasting default-on commit (d6f3c405a4) as the first bad commit for this regression. Replay now inlines the post-deferral tail of set_to (the boolbv_set_equality_to_true fast path) so deferred replay reproduces the eager clause structure, not just the eager models. union_large_array returns to ~1s; algebraic regression, SABER, and the smt2_solver suite are unaffected. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…rray) as found+fixed Updates the validation paragraph: the union_large_array CORE timeout was root-caused (deferred replay bypassed the boolbv_set_equality_to_true eager optimisation, fully bit-blasting a 100k-element array-union equality; bisected to the deferral-default commit) and fixed by applying the equality optimisation on replay. Only a pre-existing thorough-tagged constant-array timeout remains (also times out on the baseline). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…2 45.8) Re-measured the two cbmc rows of the SMT-COMP sample under the paper's 6GB memory limit (the prior 43/44.8 used 12GB): shift-add 37/66 (57.1), algebraic 42/66 (45.8) -- one benchmark needs >6GB. Numbers confirmed identical before and after the deferred-replay equality fix (no large arrays in this sample). Caption now states the 6GB limit; raw data in data/smtcomp-sample-{shiftadd,algebraic}-6gb.tsv. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

…ding) - Combined-effect: replace the misleading 'unlock eight (37/66 -> 42/66)' with the accurate net: algebra decides 8 that shift-add cannot while shift-add decides 3 the algebraic config times out on (net +5). - Conclusion: 'a third bug' -> 'further integration bugs ... most tellingly the deferred-replay issue', consistent with the three differential-found fixes documented in Threats. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

tautschnig force-pushed the features/adder branch from e86df44 to 1a8044c Compare November 3, 2023 08:55

tautschnig self-assigned this Nov 16, 2023

tautschnig force-pushed the features/adder branch from 1a8044c to 1fd90bb Compare April 1, 2026 20:20

tautschnig mentioned this pull request May 31, 2026

[WIP] Array theory improvements #8905

Draft

7 tasks

tautschnig force-pushed the features/adder branch from 1fd90bb to 76757cc Compare June 8, 2026 18:38

tautschnig and others added 24 commits June 8, 2026 19:11

Paper 1: mirror §6.4 narrative restoration

a5e9946

Same fix as multiplier-encodings.git: restore the five per-row mechanistic explanations of which encoding wins on which pattern, plus the adaptive-selection consequence. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

Save next-steps snapshot to research-plans/NEXT-STEPS.md

f17af10

Captures the research-plan status (N1/N2/N3/N4), paper status, and recommended prioritisation after the senior-review passes and the Wallace integration. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

tautschnig and others added 29 commits June 8, 2026 19:11

Doc: mark Item 14b (paper disequality-methodology rewrite) DONE

5f27ff4

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

gitignore: ignore build-*/, coverage output, and .orig backups

e9c24e3

Prevents recurrence of the accidentally-committed build-cov/build-debug trees, *.gcov/*.gcno coverage output, and *.orig patch backups. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>

tautschnig force-pushed the features/adder branch from 76757cc to 00f468b Compare June 8, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimenting with different adder implementations#7993

Experimenting with different adder implementations#7993
tautschnig wants to merge 432 commits into
diffblue:developfrom
tautschnig:features/adder

tautschnig commented Nov 2, 2023 •

edited

Loading

Uh oh!

codecov Bot commented Nov 3, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tautschnig commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tautschnig commented Nov 2, 2023 •

edited

Loading

codecov Bot commented Nov 3, 2023 •

edited

Loading