perf(gpu): cache compute pipelines across repeated solves#100
Conversation
📝 WalkthroughWalkthroughAdded a process-wide pipeline cache in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Key concern: cache key is the raw pointer address ( Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
1 issue found across 1 file
Confidence score: 4/5
- This looks safe to merge overall; the main issue is a low-to-moderate severity concurrency inefficiency rather than a functional bug.
- In
src/gpu/pipeline.rs, concurrent calls tonewcan both miss the cache and compile the same pipeline, causing duplicate compile cost under contention. - Pay close attention to
src/gpu/pipeline.rs- cache check/insert split across compilation can lead to duplicate work in concurrent scenarios.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/gpu/pipeline.rs">
<violation number="1" location="src/gpu/pipeline.rs:41">
P2: Cache check and insert are split by compilation, so concurrent calls can both miss and compile the same pipeline. That means you still pay the shader compile cost twice when two threads hit `new` at the same time.</violation>
</file>
Architecture diagram
sequenceDiagram
participant App as Solver / Client
participant KP as KangarooPipeline
participant Cache as PIPELINE_CACHE (Static)
participant WGPU as wgpu / GPU Driver
Note over App,WGPU: Request for Compute Pipeline (KangarooPipeline::new)
App->>KP: new(ctx, variant)
KP->>KP: Extract device pointer as key
KP->>Cache: NEW: Lock and get(device_key, variant)
alt NEW: Cache Hit (Pipeline exists)
Cache-->>KP: Return cloned KangarooPipeline
KP-->>App: Return cached pipeline (SKIP Compilation)
else NEW: Cache Miss (First solve for device/variant)
Cache-->>KP: None
KP->>KP: Load shader sources (FIELD_WGSL, etc.)
KP->>WGPU: create_bind_group_layout()
WGPU-->>KP: BindGroupLayout
KP->>WGPU: create_compute_pipeline()
Note right of WGPU: Heavy operation: JIT Shader Compilation
WGPU-->>KP: ComputePipeline
KP->>Cache: NEW: Insert(device_key, variant, pipeline)
KP-->>App: Return new pipeline
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Prevents concurrent threads from both missing the cache and compiling the same shader twice.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/gpu/pipeline.rs">
<violation number="1" location="src/gpu/pipeline.rs:41">
P2: The cache lock is held during full pipeline compilation, so unrelated cache misses are serialized behind one global mutex.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gpu/pipeline.rs`:
- Around line 35-37: The static PIPELINE_CACHE stores KangarooPipeline entries
containing Arc<ComputePipeline> and Arc<BindGroupLayout>, which prevents GPU
objects from ever being dropped across different GpuContext/device lifetimes;
change the cache to hold Weak references (e.g., Weak<ComputePipeline> and
Weak<BindGroupLayout>) and update cache access code in functions that
create/load KangarooPipeline to attempt .upgrade(), recreate and replace entries
when upgrade fails, and periodically purge expired entries, or alternatively
refactor the cache out of the static and make it device-scoped on the GpuContext
so pipelines are tied to the device instance.
- Around line 39-43: The cache keyting by raw pointer (device_key via
Arc::as_ptr) is unsafe across device drops; change PIPELINE_CACHE to key by a
Weak<wgpu::Device> (or store Weak in the map entries) and, on lookup in the
pipeline retrieval path (where device_key, PIPELINE_CACHE, guard.get, and
variant are used), iterate/clean the map: upgrade each Weak to Arc, drop dead
Weaks, and compare candidate devices with the current ctx.device using
Arc::ptr_eq before returning the cached pipeline; if no match, insert a new
entry storing Arc::downgrade(&ctx.device) alongside the pipeline so stale
pointer reuse cannot return resources from a different device.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4d68a041-b094-41f8-9eb7-631598ecbea7
📒 Files selected for processing (1)
src/gpu/pipeline.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: cubic · AI code reviewer
🧰 Additional context used
📓 Path-based instructions (1)
src/gpu/**/*.rs
📄 CodeRabbit inference engine (AGENTS.md)
Never add y_parity or symClass to GpuDistinguishedPoint; collision resolution must be done CPU-side via compute_candidate_keys()
Files:
src/gpu/pipeline.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/kangaroo_affine.wgsl : Implement main GPU compute logic in WGSL shaders located at src/shaders/kangaroo_affine.wgsl
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Handle GPU context setup including wgpu adapter and device creation in src/gpu_crypto/context.rs
Applied to files:
src/gpu/pipeline.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)
Applied to files:
src/gpu/pipeline.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/*.wgsl : Use workgroup size of 64 threads in compute shaders (hardcoded)
Applied to files:
src/gpu/pipeline.rs
Pipeline was recompiled from scratch on every solve - shader compilation, bind group layout, the whole thing. Now cached in a static
OnceLock<Mutex<HashMap>>keyed by device pointer and workgroup variant. Repeated solves skip straight to dispatch.