Add puzzle 49 to benchmark, track time instead of ops/s#104
Conversation
Higher ops/s doesn't mean faster solving — what matters is wall-clock time. Switch Performance History from rate (M/s) to time (seconds) with improvement percentages based on time reduction. Add 49-bit puzzle case to the benchmark suite using puzzle #49's known public key. Rebuild Performance History with real measurements from AMD RX 6800S across all compilable versions (v0.3.0–v0.8.0), adding previously missing v0.5.0 entry.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2026-02-20T07:57:41.039ZApplied to files:
🔇 Additional comments (7)
📝 WalkthroughWalkthroughSwitched the 48-bit benchmark from reporting throughput (M/s) to execution time (seconds), inverted the improvement formula for lower-is-better semantics, added a 49-bit benchmark case, updated parsing/formatting and tests, and preserved legacy history when parsing fails. Changes
Sequence Diagram(s)(Skipped — change is focused on metric representation/parsing and small control flow; not a multi-component sequential feature.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/benchmark.rs`:
- Around line 245-247: The code currently strips the trailing "s" and parses
legacy throughput strings (e.g. "25.10 M/s") which can leave malformed suffixes
(like "25.10 M/") leading to parse failures and emptying rows; update the
parsing logic used by upsert_history_row (and the helper that strips/splits
units) to robustly handle legacy formats by detecting parse errors and, on
failure, preserving the original rows instead of returning an empty vec: return
the original rows unchanged (or append the new row without deleting history),
and emit a debug/warn message so callers (e.g. format_history_section) won’t
overwrite the whole section when parsing fails. Ensure the change touches the
functions that perform the strip_suffix("s") and numeric parse so they
short-circuit on parse error and avoid destructive replacement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: da6ba7c4-1963-4a51-b250-fc57bc5fcb6c
📒 Files selected for processing (2)
BENCHMARKS.mdsrc/benchmark.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: cubic · AI code reviewer
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)
Applied to files:
src/benchmark.rs
🔇 Additional comments (5)
src/benchmark.rs (4)
33-38: 49-bit benchmark case is wired correctlyThis case is consistent with the existing benchmark flow and should execute without special handling.
175-182: 48-bit history source switched cleanly to wall-clock timeLine 175 through Line 182 now pulls
time_secsfor the 48-bit row and only updates history when present, which avoids writing partial history data.
331-349: Improvement math matches the new metric directionLine 338 correctly computes improvement as time reduction vs baseline, so faster runs produce positive percentages.
507-547: Updated test fixtures/assertions are aligned with time-based historyThe history fixtures and assertions now validate seconds-format rows and the new expected values.
BENCHMARKS.md (1)
33-41: Performance History table is consistent with the new time-based metricHeader, units, and improvement direction are coherent in this section.
There was a problem hiding this comment.
No issues found across 2 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Auto-approved: Low-impact changes restricted to the benchmarking tool and its documentation. Switching metrics and adding a test case does not affect core business logic or security.
Architecture diagram
sequenceDiagram
participant User as CLI (kangaroo)
participant Runner as Benchmark Runner
participant Suites as Benchmark Cases
participant Parser as History Logic
participant FS as File System (BENCHMARKS.md)
User->>Runner: --benchmark
loop For each Case in CASES
Runner->>Suites: Execute puzzle solver
Note over Suites: Includes NEW: 49-bit puzzle
Suites-->>Runner: Return CaseResult (time_secs)
end
Runner->>FS: Read existing documentation
FS-->>Runner: Markdown content
Runner->>Parser: update_performance_history(content, version, time_secs)
Parser->>Parser: parse_history_rows()
Note right of Parser: CHANGED: Parses "s" suffix<br/>instead of "M/s"
alt Existing version found
Parser->>Parser: Update existing row time
else New version
Parser->>Parser: Append new HistoryRow
end
Parser->>Parser: format_history_section()
Note right of Parser: CHANGED: Improvement % calculated<br/>via time reduction (lower is better)
Parser-->>Runner: Updated Markdown content
Runner->>FS: Write updated BENCHMARKS.md
Runner-->>User: Display results summary
If existing Performance History has M/s format rows that can't be parsed as seconds, keep the section unchanged instead of silently dropping it.
Performance History tracked ops/s - misleading metric. Higher throughput doesn't mean faster solve, wall-clock time does. Switched to seconds with improvement % based on time reduction. Added 49-bit puzzle case to the suite.
Rebuilt history table with real measurements on AMD RX 6800S across v0.3.0–v0.8.0, added missing v0.5.0. v0.2.0 wouldn't compile - dropped it.