Conversation
Replace the expensive per-candidate topological DFS in merge scoring with bitset union + popcount (O(n/64) vs O(subgraph)). Pre-compute node bitsets during initial partition analysis and update incrementally on merge. Key changes: - Add TopoTraverser with dense Vec<u32> visited buffer using generation counter pattern, replacing IndexSet-based DFS (aig.rs) - Convert recursive DFS to iterative stack-based DFS to avoid stack overflow on deep AIGs and improve cache locality - Add bitset_union_popcount/bitset_or_inplace helpers for merge scoring - Add Partition::quick_reject() pre-check to skip obviously infeasible merges before expensive hierarchy construction - Add cancel-on-success AtomicBool to speculative parallel trials so in-progress build_one() calls bail early when another trial succeeds - Add build_one_cancellable() that checks cancel flag between boomerang stages - Extract collect_comb_outputs() helper, hoist out of inner loop - Update CLAUDE.md to document Metal backend and benchmarks Tested on NVDLA benchmark (254MB netlist): 316 initial → 55 merged partitions in 11m51s wall clock. Co-developed-by: Claude Code v2.1.42 (claude-opus-4-6)
…ures, parallel flatten - Pass prebuilt Partition objects from cut_map_interactive to process_partitions, eliminating ~316 redundant build_one() calls for NVDLA - Replace IndexSet-based topo_traverse_generic with dense TopoTraverser at all hot call sites (pe.rs, repcut.rs, staging.rs) - Replace IndexMap id2order with Vec<usize> in build_one_boomerang_stage for direct O(1) lookups instead of hash-based access - Replace IndexMap hier_visited_nodes_count with Vec<usize> + active_nodes list for O(1) contains/increment instead of hash-based entry() - Add dense Vec<bool> shadows for realized_inputs and unrealized_comb_outputs in build_one_cancellable for fast contains() checks in inner loops - Parallelize init_afters_writeouts and build_script in flatten.rs with rayon Co-developed-by: Claude Code v2.1.42 (claude-opus-4-6)
af361f9 to
0385a7a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TopoTraverserwith dense visited buffer and iterative stack-based DFS, replacingIndexSet-based recursive DFSPartition::quick_reject()pre-check to skip obviously infeasible merges before expensive hierarchy constructionAtomicBoolso speculative parallelbuild_one()trials bail early when another succeedsTest plan
cargo check -r --features metalcompiles cleanly