Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
1decff8
Add GPU arrival time readback for timing-aware VCD output
robtaylor Feb 27, 2026
39bb667
Document latch-based design limitation and support requirements
robtaylor Feb 27, 2026
3e22fe8
Fix timing arrival unit tests and make write_output_vcd_timed generic
robtaylor Feb 27, 2026
dd9ad4e
Fix Metal kernel double-buffer bug reading gate_delay from wrong stage
robtaylor Feb 27, 2026
1c31d64
Fix compiler warnings in gem lib (unused variables, dead assignments)
robtaylor Feb 27, 2026
627a898
Add CVC reference testbench and timing VCD comparison tools
robtaylor Feb 27, 2026
e01d8f9
Add Docker-based CVC reference simulation for local timing validation
robtaylor Feb 27, 2026
7fb5467
Document conservative timing model and CVC validation results
robtaylor Feb 27, 2026
718e42c
Add multi-depth synthetic test for timing-aware bit packing
robtaylor Feb 27, 2026
be8550b
Fix SDF extraction to prefer post-PnR over pre-PnR timing
robtaylor Feb 27, 2026
82b1a03
Update MCU SoC test data from librelane rebuild
robtaylor Feb 27, 2026
231c6cd
Add --stimulus-vcd option to cosim for primary input capture
robtaylor Feb 28, 2026
1126c41
Add dlymetal gate and diode cell support to SKY130 library
robtaylor Feb 28, 2026
6f8f49b
Sort boomerang level-1 endpoints by logic level for timing-aware packing
robtaylor Feb 28, 2026
a3518a1
Add unary NOT (~) support to structural Verilog parser
robtaylor Feb 28, 2026
cf4770d
Make gemparts optional for sim and cosim commands
robtaylor Feb 28, 2026
d4f1722
Fix stimulus VCD format and add SKY130 INV cell support
robtaylor Feb 28, 2026
63e622c
Add CVC SDF-annotated simulation for Loom comparison
robtaylor Feb 28, 2026
4d10664
Remove loom map command; always generate partitions inline
robtaylor Feb 28, 2026
4a20c8c
Fix CVC CI: build with -O0 to avoid SDF annotation segfault
robtaylor Mar 1, 2026
d2d6854
Remove timing_sim_cpu binary and all references
robtaylor Mar 1, 2026
f9754bf
Add automated CVC vs Loom timing comparison in CI
robtaylor Mar 1, 2026
5902234
Add MCU SoC post-layout timing comparison in CI
robtaylor Mar 1, 2026
f686d5a
Fix gen_cell_models.py sky130 path for CI
robtaylor Mar 1, 2026
e35914a
Fix gen_cell_models.py PEP 723 metadata header
robtaylor Mar 1, 2026
b766253
Handle SDF parse failure gracefully in MCU SoC CI
robtaylor Mar 1, 2026
e7e6709
Fix SDF parser: handle empty delay specs ()
robtaylor Mar 1, 2026
bac4244
Fix MCU SoC timing VCD generation: strip SDF timing checks before GPU…
robtaylor Mar 1, 2026
a278d63
Add timing validation methodology documentation
robtaylor Mar 1, 2026
c60ca50
Fix SDF parser: handle COND pin specs and add resilient timing check …
robtaylor Mar 1, 2026
9f613c5
Fix CVC stimulus: map Jacquard port names to gpio_in[N] for CVC testb…
robtaylor Mar 1, 2026
df137af
Fix MCU SoC comparison: correct CVC artifact download path
robtaylor Mar 2, 2026
5ce74e9
Fix MCU SoC comparison: map Jacquard output port names to gpio_out in…
robtaylor Mar 2, 2026
539df6a
Add pre-layout timing test infrastructure with Liberty-only SDF
robtaylor Mar 2, 2026
f448e43
Document timing validation methodology with pre-layout test cases
robtaylor Mar 2, 2026
d5cdf58
Document implementation plan for enabling --timing-vcd on CUDA/HIP
robtaylor Mar 2, 2026
1710e00
Document implementation plan for cosim timing support (step 8)
robtaylor Mar 2, 2026
6658655
Connect --liberty CLI flag to timing pipeline for pre-layout timing
robtaylor Mar 2, 2026
494526e
Fix load_timing to use aigpin_cell_origins for SKY130 designs
robtaylor Mar 2, 2026
1d9aebd
Fix MCU SoC comparison: use non-timed replay for CVC validation
robtaylor Mar 2, 2026
9aa1dc4
Add --skip-bits to compare_outputs.py for SPI flash pin exclusion
robtaylor Mar 2, 2026
5202c5e
Add constant_ports (POR signals) to MCU SoC cosim configuration
robtaylor Mar 3, 2026
54a23a9
Fix constant-D DFF Q aliased to posedge flag position
robtaylor Mar 4, 2026
69522cb
Milestone: MCU SoC cosim working end-to-end on Metal GPU
robtaylor Mar 4, 2026
ec918b0
Fix UART decoder cycles_per_bit: apply 2x factor for correct baud rate
robtaylor Mar 4, 2026
543533b
Make UART boot output check mandatory in CI
robtaylor Mar 4, 2026
c636dee
Remove stale chipflow-examples submodule from git index
robtaylor Mar 4, 2026
f622396
Remove stale root-level submodule entries from git index
robtaylor Mar 4, 2026
0e0dea1
Enable --timing-vcd on CUDA/HIP: wire arrival_state_offset to GPU ker…
robtaylor Mar 4, 2026
cd3b65a
Add --timing-vcd to cosim for native timing-accurate VCD output
robtaylor Mar 4, 2026
b9a81ac
Rename compare_outputs.py → compare_simulation.py, add timing comparison
robtaylor Mar 4, 2026
fc6da45
Add SDF timing path tracer tool for debugging timing discrepancies
robtaylor Mar 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
458 changes: 360 additions & 98 deletions .github/workflows/ci.yml

Large diffs are not rendered by default.

16 changes: 13 additions & 3 deletions .github/workflows/mcu-soc-rebuild.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,12 +133,22 @@ jobs:
cp "$LAST_NL" tests/mcu_soc/data/6_final_raw.v
fi

# SDF timing (nom corner)
# SDF timing — prefer post-PnR (stapostpnr) over pre-PnR (staprepnr)
# Use nom_tt (typical) corner for simulation
if ls "$RUN_DIR"/final/sdf/*.sdf 1>/dev/null 2>&1; then
cp "$RUN_DIR"/final/sdf/*.sdf tests/mcu_soc/data/6_final.sdf
else
LAST_SDF=$(find "$RUN_DIR" -name '*.sdf' 2>/dev/null | sort -r | head -1)
[ -n "$LAST_SDF" ] && cp "$LAST_SDF" tests/mcu_soc/data/6_final.sdf || true
POSTPNR_SDF=$(find "$RUN_DIR" -path '*stapostpnr*' -name '*nom_tt*.sdf' 2>/dev/null | head -1)
PREPNR_SDF=$(find "$RUN_DIR" -path '*staprepnr*' -name '*nom_tt*.sdf' 2>/dev/null | head -1)
if [ -n "$POSTPNR_SDF" ]; then
echo "Using post-PnR SDF: $POSTPNR_SDF"
cp "$POSTPNR_SDF" tests/mcu_soc/data/6_final.sdf
elif [ -n "$PREPNR_SDF" ]; then
echo "::warning::Post-PnR SDF not found, using pre-PnR SDF: $PREPNR_SDF"
cp "$PREPNR_SDF" tests/mcu_soc/data/6_final.sdf
else
echo "::warning::No SDF files found"
fi
fi

# SDC constraints
Expand Down
1 change: 1 addition & 0 deletions .pdm-python
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/Users/roberttaylor/Code/ChipFlow/Loom/.venv/bin/python
29 changes: 5 additions & 24 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,9 @@ cargo run -r --features hip --bin jacquard -- sim --help

1. **Memory synthesis** (Yosys): Map memories using `memlib_yosys.txt` → outputs `memory_mapped.v`
2. **Logic synthesis** (DC or Yosys): Synthesize to `aigpdk.lib` cells → outputs `gatelevel.gv`
3. **Jacquard mapping**: `jacquard map gatelevel.gv result.gemparts`
4. **Simulation**: `jacquard sim` with `gatelevel.gv result.gemparts input.vcd output.vcd NUM_BLOCKS`
3. **Simulation**: `jacquard sim gatelevel.gv input.vcd output.vcd NUM_BLOCKS`

Set `NUM_BLOCKS` to 2× the number of GPU streaming multiprocessors (SMs) for CUDA, 2× the number of Compute Units (CUs) for HIP/AMD, or 1 for Metal.
Partitioning happens automatically at simulation start. Set `NUM_BLOCKS` to 2× the number of GPU streaming multiprocessors (SMs) for CUDA, 2× the number of Compute Units (CUs) for HIP/AMD, or 1 for Metal.

## Architecture

Expand All @@ -65,8 +64,7 @@ NetlistDB (Verilog) → AIG → StagedAIG → Partitions → FlattenedScript →

### Binary Tools (`src/bin/`)

- **`jacquard.rs`**: Unified CLI — `jacquard map` (partition mapping), `jacquard sim` (GPU simulation), `jacquard cosim` (co-simulation)
- **`timing_sim_cpu.rs`**: CPU-based timing simulation with SDF back-annotation (development tool)
- **`jacquard.rs`**: Unified CLI — `jacquard sim` (GPU simulation), `jacquard cosim` (co-simulation)
- **`timing_analysis.rs`**: Static timing analysis utility (development tool)

### Dependencies (`vendor/eda-infra-rs` submodule)
Expand Down Expand Up @@ -118,15 +116,9 @@ cargo run -r --features metal --bin jacquard -- sim ... --max-cycles 1000
Pre-synthesized benchmark designs are in `benchmarks/dataset/` (git submodule). See `benchmarks/README.md` for full instructions.

```bash
# Generate partition file (NVDLA - smallest, good for testing)
cargo run -r --bin jacquard -- map \
benchmarks/dataset/nvdlaAIG.gv \
benchmarks/nvdla.gemparts

# Run Metal simulation benchmark
# Run Metal simulation benchmark (NVDLA - smallest, good for testing)
cargo run -r --features metal --bin jacquard -- sim \
benchmarks/dataset/nvdlaAIG.gv \
benchmarks/nvdla.gemparts \
benchmarks/dataset/nvdla.pdp_16x6x16_4x2_split_max_int8_0.vcd \
benchmarks/nvdla_output.vcd \
1
Expand Down Expand Up @@ -158,7 +150,7 @@ uv run netlist-graph path <netlist.v> "<source>" "<target>"
# Search for nets matching pattern
uv run netlist-graph search <netlist.v> "<pattern>"

# Generate watchlist JSON for timing_sim_cpu
# Generate watchlist JSON for signal monitoring
uv run netlist-graph watchlist <netlist.v> output.json signal1 signal2 ...

# Interactive mode for exploration
Expand All @@ -177,14 +169,3 @@ uv run netlist-graph path tests/timing_test/minimal_build/6_final.v "gpio_in[40]
### Timing Violation Detection

See `docs/timing-violations.md` for the full guide on enabling GPU-side setup/hold violation checks, interpreting violation reports, and tracing violations back to source signals using `netlist_graph`.

### Timing Simulation with Signal Tracing

```bash
# Create watchlist and trace signals
cargo run -r --bin timing_sim_cpu -- netlist.v \
--config testbench.json \
--watchlist signals.json \
--trace-output trace.csv \
--max-cycles 1000
```
18 changes: 6 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,28 +59,22 @@ cargo build -r --features cuda --bin jacquard

## Usage

Jacquard operates in two phases:

1. **Map** your synthesized gate-level netlist to a `.gemparts` file (one-time cost):

```sh
cargo run -r --bin jacquard -- map design.gv design.gemparts
```

2. **Simulate** with a VCD input waveform:
Simulate a gate-level netlist with a VCD input waveform:

```sh
# Metal (macOS) - use NUM_BLOCKS=1
cargo run -r --features metal --bin jacquard -- sim design.gv design.gemparts input.vcd output.vcd 1
cargo run -r --features metal --bin jacquard -- sim design.gv input.vcd output.vcd 1

# CUDA (Linux) - set NUM_BLOCKS to 2x your GPU's SM count
cargo run -r --features cuda --bin jacquard -- sim design.gv design.gemparts input.vcd output.vcd NUM_BLOCKS
cargo run -r --features cuda --bin jacquard -- sim design.gv input.vcd output.vcd NUM_BLOCKS

# With SDF timing back-annotation:
cargo run -r --features metal --bin jacquard -- sim design.gv design.gemparts input.vcd output.vcd 1 \
cargo run -r --features metal --bin jacquard -- sim design.gv input.vcd output.vcd 1 \
--sdf design.sdf --sdf-corner typ
```

Partitioning (mapping the design to GPU blocks) happens automatically at startup.

**See [docs/usage.md](./docs/usage.md) for full documentation** including synthesis preparation, VCD scope handling, and troubleshooting.

## Documentation
Expand Down
3 changes: 0 additions & 3 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,2 @@
# Generated partition files (can be large, generate locally)
*.gemparts

# Simulation outputs
*_output.vcd
33 changes: 6 additions & 27 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,48 +23,27 @@ git submodule update --init --recursive

## Running Benchmarks

### 1. Generate partition files (one-time)
### 1. Run Metal simulation

Each design needs a `.gemparts` file generated by the partitioner:
Partitioning happens automatically at startup.

```bash
# NVDLA (smallest, good for testing)
cargo run -r --features metal --bin cut_map_interactive -- \
# NVDLA benchmark (smallest, good for testing)
cargo run -r --features metal --bin jacquard -- sim \
benchmarks/dataset/nvdlaAIG.gv \
benchmarks/nvdla.gemparts

# Rocket
cargo run -r --features metal --bin cut_map_interactive -- \
benchmarks/dataset/rocketAIG.gv \
benchmarks/rocket.gemparts

# Gemmini
cargo run -r --features metal --bin cut_map_interactive -- \
benchmarks/dataset/gemminiAIG.gv \
benchmarks/gemmini.gemparts
```

### 2. Run Metal simulation

```bash
# NVDLA benchmark
cargo run -r --features metal --bin metal_test -- \
benchmarks/dataset/nvdlaAIG.gv \
benchmarks/nvdla.gemparts \
benchmarks/dataset/nvdla.pdp_16x6x16_4x2_split_max_int8_0.vcd \
benchmarks/nvdla_output.vcd \
1

# Rocket benchmark
cargo run -r --features metal --bin metal_test -- \
cargo run -r --features metal --bin jacquard -- sim \
benchmarks/dataset/rocketAIG.gv \
benchmarks/rocket.gemparts \
benchmarks/dataset/rocket.median.vcd \
benchmarks/rocket_output.vcd \
1
```

### 3. Criterion micro-benchmarks
### 2. Criterion micro-benchmarks

```bash
cargo bench --bench event_buffer
Expand Down
Binary file added benchmarks/nvdla.gemparts
Binary file not shown.
Binary file added benchmarks/rocket.gemparts
Binary file not shown.
16 changes: 10 additions & 6 deletions csrc/kernel_v1.cu
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,20 @@ void simulate_v1_noninteractive_simple_scan_cuda(
u32 *sram_xmask,
usize num_cycles,
usize state_size,
u32 *states_noninteractive
u32 *states_noninteractive,
int arrival_state_offset
)
{
const u32 *timing_constraints = nullptr;
EventBuffer *event_buffer = nullptr;
void *arg_ptrs[11] = {
void *arg_ptrs[12] = {
(void *)&num_blocks, (void *)&num_major_stages,
(void *)&blocks_start, (void *)&blocks_data,
(void *)&sram_data, (void *)&sram_xmask,
(void *)&num_cycles, (void *)&state_size,
(void *)&states_noninteractive,
(void *)&timing_constraints, (void *)&event_buffer
(void *)&timing_constraints, (void *)&event_buffer,
(void *)&arrival_state_offset
};
checkCudaErrors(cudaLaunchCooperativeKernel(
(void *)simulate_v1_noninteractive_simple_scan, num_blocks, 256,
Expand All @@ -56,16 +58,18 @@ void simulate_v1_noninteractive_timed_cuda(
usize state_size,
u32 *states_noninteractive,
const u32 *timing_constraints,
u8 *event_buffer
u8 *event_buffer,
int arrival_state_offset
)
{
void *arg_ptrs[11] = {
void *arg_ptrs[12] = {
(void *)&num_blocks, (void *)&num_major_stages,
(void *)&blocks_start, (void *)&blocks_data,
(void *)&sram_data, (void *)&sram_xmask,
(void *)&num_cycles, (void *)&state_size,
(void *)&states_noninteractive,
(void *)&timing_constraints, (void *)&event_buffer
(void *)&timing_constraints, (void *)&event_buffer,
(void *)&arrival_state_offset
};
checkCudaErrors(cudaLaunchCooperativeKernel(
(void *)simulate_v1_noninteractive_simple_scan, num_blocks, 256,
Expand Down
16 changes: 10 additions & 6 deletions csrc/kernel_v1.hip.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,20 +41,22 @@ void simulate_v1_noninteractive_simple_scan_hip(
u32 *sram_xmask,
usize num_cycles,
usize state_size,
u32 *states_noninteractive
u32 *states_noninteractive,
int arrival_state_offset
)
{
validate_warp_size();

const u32 *timing_constraints = nullptr;
EventBuffer *event_buffer = nullptr;
void *arg_ptrs[11] = {
void *arg_ptrs[12] = {
(void *)&num_blocks, (void *)&num_major_stages,
(void *)&blocks_start, (void *)&blocks_data,
(void *)&sram_data, (void *)&sram_xmask,
(void *)&num_cycles, (void *)&state_size,
(void *)&states_noninteractive,
(void *)&timing_constraints, (void *)&event_buffer
(void *)&timing_constraints, (void *)&event_buffer,
(void *)&arrival_state_offset
};
checkHipErrors(hipLaunchCooperativeKernel(
(void *)simulate_v1_noninteractive_simple_scan,
Expand All @@ -76,18 +78,20 @@ void simulate_v1_noninteractive_timed_hip(
usize state_size,
u32 *states_noninteractive,
const u32 *timing_constraints,
u8 *event_buffer
u8 *event_buffer,
int arrival_state_offset
)
{
validate_warp_size();

void *arg_ptrs[11] = {
void *arg_ptrs[12] = {
(void *)&num_blocks, (void *)&num_major_stages,
(void *)&blocks_start, (void *)&blocks_data,
(void *)&sram_data, (void *)&sram_xmask,
(void *)&num_cycles, (void *)&state_size,
(void *)&states_noninteractive,
(void *)&timing_constraints, (void *)&event_buffer
(void *)&timing_constraints, (void *)&event_buffer,
(void *)&arrival_state_offset
};
checkHipErrors(hipLaunchCooperativeKernel(
(void *)simulate_v1_noninteractive_simple_scan,
Expand Down
Loading
Loading