Skip to content

Fix 2x clock frequency bug and add timing investigation tools#58

Merged
robtaylor merged 4 commits intomainfrom
timing-investigation
Mar 4, 2026
Merged

Fix 2x clock frequency bug and add timing investigation tools#58
robtaylor merged 4 commits intomainfrom
timing-investigation

Conversation

@robtaylor
Copy link
Contributor

Summary

  • Fix 2x clock frequency bug in multi-clock scheduler — DFFs only captured every other cycle because the scheduler's per-GCD-tick entries (half-cycle) were consumed one-per-tick instead of paired. This was the root cause of Issue Investigate 2x UART baud rate factor in Amaranth-generated designs #54 (2x UART baud rate, 2x flash clock frequency).
  • Add --timing-vcd to cosim for native timing-accurate VCD output with per-signal arrival offsets from SDF data
  • Add --dump-dff to cosim for debugging internal DFF state per cycle
  • Fix SDF loading for hierarchical netlists: detect and strip instance prefix
  • Remove legacy single-clock code path — multi-clock scheduler now handles all cases
  • Add SDF timing path tracer (scripts/sdf_trace.py) for debugging timing discrepancies
  • Add dump-paths CLI subcommand for AIG critical path analysis
  • Wire arrival_state_offset through CUDA/HIP kernels for cross-platform --timing-vcd

Key fix: 2x clock frequency (Issue #54)

The MultiClockScheduler produces one entry per GCD tick (half-cycle). For a single 40ns clock: [falling, rising]. Each cosim tick used ONE entry for both fall and rise phases:

  • Even ticks → posedge_flag=0 → DFFs don't capture
  • Odd ticks → posedge_flag=1 → DFFs capture

Fix: pair consecutive schedule entries so each tick gets correct fall + rise ops.

Metric Before After
flash_clk transitions 485 vs 973 (0.50x) 972 vs 973 (1.00x)
Non-flash GPIO match vs CVC 42.9% 100%

Fixes #54

Test plan

  • Cosim MCU SoC passes (SIMULATION: PASSED)
  • Non-flash GPIO bits match CVC 100% (was 42.9%)
  • Flash clock transition count matches CVC (972 vs 973)
  • DFF state dump shows captures every cycle (no frozen pairs)
  • CI check (CUDA/HIP builds)

…prefix

When a netlist wraps cells inside a submodule (e.g., openframe_project_wrapper
→ top top_inst → cells), NetlistDB produces paths like "top_inst._58619_" but
OpenROAD's SDF has flat paths like "_58619_". This caused all SDF lookups to
fail silently, resulting in zero timing delays.

Fix: sample the first 100 cell paths against the SDF. If most fail but
stripping the first dot-separated component succeeds, strip the prefix
globally. This correctly detects and handles the wrapper hierarchy without
hardcoding module names.

Result: 28,381 SDF cells matched (previously 0), timing now properly applied
including PnR-inserted output buffers, clock buffers, and fanout buffers.

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
Extend jacquard with ability to analyze and dump AIG critical paths with full timing details.

Changes:
- Add AIG.dump_critical_paths_detailed() method that returns formatted string showing:
  * Per-endpoint critical path from source to sink
  * Each node's cell origin (synthesis cells that created it)
  * Gate delays from Liberty library
  * Cumulative arrival times

- Add 'dump-paths' CLI subcommand to jacquard binary
  * Takes netlist, SDF, and Liberty library files
  * Configurable output limit (default: top 5 paths)
  * Dumps complete path details for debugging timing issues

- Update main() to handle new DumpPaths command

Tested on MCU SoC (6_final.v): Successfully traces multiple critical paths
with cell origins and per-gate timing information.

This is the final step of the timing comparison report implementation (goal step 15).

Co-developed-by: Claude Haiku 4.5 (claude-haiku-4-5-20251001)
The MultiClockScheduler produces one entry per GCD tick (half-cycle),
alternating between falling and rising edges. Each cosim tick does both
a fall eval and rise eval, consuming one schedule entry for both phases.

For a single clock with schedule [fall, rise], this meant:
- Even ticks used the falling-edge entry → posedge_flag=0 → DFFs skip
- Odd ticks used the rising-edge entry → posedge_flag=1 → DFFs capture

DFFs only captured every other cycle, making the design run at half
the correct frequency. This was the root cause of Issue #54 (2x UART
baud rate factor).

Fix: pair consecutive schedule entries so each cosim tick gets fall_ops
from the falling-edge GCD tick and rise_ops from the rising-edge GCD
tick. For a single 40ns clock, this reduces the schedule from 2 entries
(alternating, broken) to 1 paired entry (correct).

Also removes the legacy single-clock code path (build_falling_edge_ops,
build_rising_edge_ops, all_posedge_flag_bits, all_negedge_flag_bits)
since the multi-clock scheduler now handles all cases.

Also adds --dump-dff option for debugging internal DFF state per cycle,
and fixes an AIG type reference in run_timing_analysis.

Fixes #54

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
The 2x factor in cycles_per_bit was compensating for the half-speed
DFF capture bug (now fixed). With DFFs capturing every cycle, the
correct cycles_per_bit for 25MHz/115200 baud is 217, not 434.

Co-developed-by: Claude Code v2.1.50 (claude-opus-4-6)
@robtaylor robtaylor merged commit c57a94b into main Mar 4, 2026
12 checks passed
@robtaylor robtaylor deleted the timing-investigation branch March 4, 2026 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate 2x UART baud rate factor in Amaranth-generated designs

1 participant