Conversation
📝 WalkthroughWalkthroughThis change adjusts internal GPU kernel calibration and dispatch tuning in the solver module. The calibration timing target increased from 50 to 120 milliseconds, the workgroup variant selection threshold transitioned from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/solver.rs`:
- Around line 24-29: The constant TARGET_DISPATCH_MS in solver.rs was raised to
120, violating the file-level TDR safety contract requiring calibration under
50ms; change TARGET_DISPATCH_MS back to 50 (or a value <= 50) so the
auto-calibration in the solver honors the 50ms TDR threshold and avoids long GPU
startup stalls.
- Around line 601-605: The early-exit uses a hard-coded doubling check that no
longer matches the non-power-of-two candidates array (candidates), causing
calibration to stop too early; change the stopping condition to predict the next
candidate's time by multiplying elapsed_ms by the ratio of
next_candidate/current_candidate (use candidates[i] and candidates[i+1]) and
compare that to target (instead of elapsed_ms * 2 > target), so the loop only
breaks when the predicted time for the next candidate would exceed target.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9e8a6860-6acc-4b1f-af95-b1fc88d0730a
📒 Files selected for processing (1)
src/solver.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: cubic · AI code reviewer
🧰 Additional context used
📓 Path-based instructions (1)
src/solver.rs
📄 CodeRabbit inference engine (AGENTS.md)
src/solver.rs: Implement solver state machine logic in src/solver.rs using the step() method and calibration
Auto-calibrate steps to stay under 50ms TDR (Timeout Detection and Recovery) threshold
Files:
src/solver.rs
🧠 Learnings (11)
📓 Common learnings
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/kangaroo_affine.wgsl : Implement main GPU compute logic in WGSL shaders located at src/shaders/kangaroo_affine.wgsl
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/*.wgsl : Use workgroup size of 64 threads in compute shaders (hardcoded)
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/solver.rs : Auto-calibrate steps to stay under 50ms TDR (Timeout Detection and Recovery) threshold
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/solver.rs : Auto-calibrate steps to stay under 50ms TDR (Timeout Detection and Recovery) threshold
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/solver.rs : Implement solver state machine logic in src/solver.rs using the step() method and calibration
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/cpu/dp_table.rs : Never trust DP buffer count without clamping to MAX_DISTINGUISHED_POINTS
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/cpu/dp_table.rs : Implement collision detection using the Distinguished Points (DP) pattern in src/cpu/dp_table.rs
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/cpu/dp_table.rs : Never use Jacobian coordinates in final DP storage; must convert to affine coordinates
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu/buffers.rs : Clamp DP buffer to 90% capacity per dispatch to prevent overflow
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu/**/*.rs : Never add y_parity or symClass to GpuDistinguishedPoint; collision resolution must be done CPU-side via compute_candidate_keys()
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/cpu/init.rs : Never split jump table for negation map; use add/sub operations with the same 256-entry table
Applied to files:
src/solver.rs
📚 Learning: 2026-03-31T16:28:59.172Z
Learnt from: oritwoen
Repo: oritwoen/kangaroo PR: 100
File: src/gpu/pipeline.rs:35-37
Timestamp: 2026-03-31T16:28:59.172Z
Learning: The `kangaroo` project is a CLI solver, not a service. `GpuContext` is created once per process run, and the process exits after solving. There is no create/destroy cycle for GPU contexts, so a static `PIPELINE_CACHE` holding strong `Arc` references in `src/gpu/pipeline.rs` is intentional and safe for this use case.
Applied to files:
src/solver.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Auto-calculate DP bits from range_bits and clamp to 8-40 range
Applied to files:
src/solver.rs
🔇 Additional comments (1)
src/solver.rs (1)
205-205: Boundary condition fix is correct.Line 205 switching to
num_kangaroos >= 65_536removes the exact-threshold split and makes variant selection deterministic at the boundary.
There was a problem hiding this comment.
No issues found across 1 file
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Auto-approved: Small, low-risk performance tuning and an off-by-one fix in GPU solver calibration. Isolated logic changes with low risk of breakage.
Architecture diagram
sequenceDiagram
participant S as KangarooSolver
participant C as GPU Context
participant G as GPU Backend
Note over S,G: Initialization & Variant Selection
S->>C: Query max_workgroup_size()
S->>S: CHANGED: select_best_variant()
Note right of S: Now selects Wg128 if kangaroos >= 65,536<br/>(Fixed off-by-one error)
Note over S,G: Calibration Phase
loop NEW: Candidates [16, 17, 18, 24, 64, 128, 256, 512]
S->>G: Dispatch test workload
G-->>S: Return measured duration
S->>S: CHANGED: Evaluate vs TARGET_DISPATCH_MS
Note right of S: Threshold increased to 120ms to<br/>mask host-to-device latency overhead
end
S->>S: Select optimal steps_per_call
S-->>S: Finalize KangarooPipeline
Off-by-one in
select_best_variant()kept 65k herds onWg64when they should hitWg128. Dispatch target was too low at 50ms - bumped to 120ms so calibration doesn't overpay host round-trip on small solves. Added non-power-of-two candidates around the 24-32 step cliff where benchmark GPUs actually diverge.Fixes #102