Skip to content

Add puzzle 49 to benchmark, track time instead of ops/s#104

Merged
oritwoen merged 2 commits intomainfrom
feat/benchmark-49bit-time-history
Apr 2, 2026
Merged

Add puzzle 49 to benchmark, track time instead of ops/s#104
oritwoen merged 2 commits intomainfrom
feat/benchmark-49bit-time-history

Conversation

@oritwoen
Copy link
Copy Markdown
Owner

@oritwoen oritwoen commented Apr 2, 2026

Performance History tracked ops/s - misleading metric. Higher throughput doesn't mean faster solve, wall-clock time does. Switched to seconds with improvement % based on time reduction. Added 49-bit puzzle case to the suite.

Rebuilt history table with real measurements on AMD RX 6800S across v0.3.0–v0.8.0, added missing v0.5.0. v0.2.0 wouldn't compile - dropped it.

Higher ops/s doesn't mean faster solving — what matters is wall-clock
time. Switch Performance History from rate (M/s) to time (seconds) with
improvement percentages based on time reduction.

Add 49-bit puzzle case to the benchmark suite using puzzle #49's known
public key.

Rebuild Performance History with real measurements from AMD RX 6800S
across all compilable versions (v0.3.0–v0.8.0), adding previously
missing v0.5.0 entry.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 80fb5b02-80e6-421c-b23b-d82bc47f298a

📥 Commits

Reviewing files that changed from the base of the PR and between d0f2167 and cec41d2.

📒 Files selected for processing (1)
  • src/benchmark.rs
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: cubic · AI code reviewer
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)

Applied to files:

  • src/benchmark.rs
🔇 Additional comments (7)
src/benchmark.rs (7)

33-38: 49-bit case looks structurally correct.

Start 0x1000000000000 (2^48) with range_bits: 49 gives search space [2^48, 2^49) which is right for puzzle #49. The pubkey format is valid compressed SEC1.

Worth double-checking the pubkey matches the official puzzle #49 public key if you haven't already.


175-181: Clean metric switch from throughput to time.

Using time_secs directly instead of deriving M/s makes the history tracking more intuitive and aligns with the PR objective.


245-248: Good fix for the legacy preservation issue.

This handles the all-legacy scenario correctly. One edge case: if you somehow had mixed formats (old M/s rows + new seconds rows), the unparseable rows would silently drop. Probably won't happen since you're rebuilding history from scratch, but worth knowing.


303-308: Parsing logic works correctly for new format.

strip_suffix("s") on "15.00s" gives "15.00" which parses fine. Legacy "M/s" fails parse as expected, triggering the preservation guard.


338-342: Improvement calculation is correct for lower-is-better.

For baseline=25s, time=15s: (25-15)/25*100 = 40% → "+40%" meaning "40% faster". Math checks out.


605-636: Good test for legacy preservation.

Covers the exact scenario from the previous review comment - legacy M/s rows preserved when parsing fails.


511-514: Test data properly migrated to time format.

Tests use seconds format and improvement values that align with the new lower-is-better calculation.

Also applies to: 537-540


📝 Walkthrough

Walkthrough

Switched the 48-bit benchmark from reporting throughput (M/s) to execution time (seconds), inverted the improvement formula for lower-is-better semantics, added a 49-bit benchmark case, updated parsing/formatting and tests, and preserved legacy history when parsing fails.

Changes

Cohort / File(s) Summary
Benchmark docs
BENCHMARKS.md
Replaced "48-bit Rate (M/s)" column with "48-bit Time (s)" and updated table rows to report seconds and revised improvement percentages (v0.3.0 baseline).
Benchmark implementation & tests
src/benchmark.rs, tests/...
Added 49‑bit Case; changed history model from rate: f64 to time: f64; switched updates to use r.time_secs; parse/format now strip "s"; inverted improvement calculation; added defensive handling to preserve legacy/ unparsable history; updated tests and sample markdown expectations.

Sequence Diagram(s)

(Skipped — change is focused on metric representation/parsing and small control flow; not a multi-component sequential feature.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

⏱️ Numbers flipped, the table sings,
48s now measured in swings,
49 joined the marching line,
old rows kept if they won't parse fine—
small change, clearer metric signs.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main changes: adding puzzle 49 to the benchmark suite and switching the performance metric from throughput (ops/s) to wall-clock time.
Description check ✅ Passed The description clearly explains why the change matters (throughput is misleading, wall-clock time is what matters) and covers both main objectives: metric change and 49-bit puzzle addition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/benchmark-49bit-time-history
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/benchmark-49bit-time-history

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the performance Performance improvements label Apr 2, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/benchmark.rs`:
- Around line 245-247: The code currently strips the trailing "s" and parses
legacy throughput strings (e.g. "25.10 M/s") which can leave malformed suffixes
(like "25.10 M/") leading to parse failures and emptying rows; update the
parsing logic used by upsert_history_row (and the helper that strips/splits
units) to robustly handle legacy formats by detecting parse errors and, on
failure, preserving the original rows instead of returning an empty vec: return
the original rows unchanged (or append the new row without deleting history),
and emit a debug/warn message so callers (e.g. format_history_section) won’t
overwrite the whole section when parsing fails. Ensure the change touches the
functions that perform the strip_suffix("s") and numeric parse so they
short-circuit on parse error and avoid destructive replacement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: da6ba7c4-1963-4a51-b250-fc57bc5fcb6c

📥 Commits

Reviewing files that changed from the base of the PR and between 6f6e4a6 and d0f2167.

📒 Files selected for processing (2)
  • BENCHMARKS.md
  • src/benchmark.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: cubic · AI code reviewer
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)

Applied to files:

  • src/benchmark.rs
🔇 Additional comments (5)
src/benchmark.rs (4)

33-38: 49-bit benchmark case is wired correctly

This case is consistent with the existing benchmark flow and should execute without special handling.


175-182: 48-bit history source switched cleanly to wall-clock time

Line 175 through Line 182 now pulls time_secs for the 48-bit row and only updates history when present, which avoids writing partial history data.


331-349: Improvement math matches the new metric direction

Line 338 correctly computes improvement as time reduction vs baseline, so faster runs produce positive percentages.


507-547: Updated test fixtures/assertions are aligned with time-based history

The history fixtures and assertions now validate seconds-format rows and the new expected values.

BENCHMARKS.md (1)

33-41: Performance History table is consistent with the new time-based metric

Header, units, and improvement direction are coherent in this section.

Comment thread src/benchmark.rs
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Auto-approved: Low-impact changes restricted to the benchmarking tool and its documentation. Switching metrics and adding a test case does not affect core business logic or security.

Architecture diagram
sequenceDiagram
    participant User as CLI (kangaroo)
    participant Runner as Benchmark Runner
    participant Suites as Benchmark Cases
    participant Parser as History Logic
    participant FS as File System (BENCHMARKS.md)

    User->>Runner: --benchmark
    
    loop For each Case in CASES
        Runner->>Suites: Execute puzzle solver
        Note over Suites: Includes NEW: 49-bit puzzle
        Suites-->>Runner: Return CaseResult (time_secs)
    end

    Runner->>FS: Read existing documentation
    FS-->>Runner: Markdown content

    Runner->>Parser: update_performance_history(content, version, time_secs)
    
    Parser->>Parser: parse_history_rows()
    Note right of Parser: CHANGED: Parses "s" suffix<br/>instead of "M/s"

    alt Existing version found
        Parser->>Parser: Update existing row time
    else New version
        Parser->>Parser: Append new HistoryRow
    end

    Parser->>Parser: format_history_section()
    Note right of Parser: CHANGED: Improvement % calculated<br/>via time reduction (lower is better)

    Parser-->>Runner: Updated Markdown content
    Runner->>FS: Write updated BENCHMARKS.md
    Runner-->>User: Display results summary
Loading

If existing Performance History has M/s format rows that can't be parsed
as seconds, keep the section unchanged instead of silently dropping it.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Auto-approved: This is a safe update to benchmarking utilities and documentation. It switches performance metrics to time-based measurements and adds a new test case without affecting core logic.

@oritwoen oritwoen self-assigned this Apr 2, 2026
@oritwoen oritwoen merged commit e550cbd into main Apr 2, 2026
4 checks passed
@oritwoen oritwoen deleted the feat/benchmark-49bit-time-history branch April 2, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant