Skip to content

perf(gpu): cache compute pipelines across repeated solves#100

Merged
oritwoen merged 2 commits intomainfrom
autoresearch/puzzle-speed/02-pipeline-cache
Mar 31, 2026
Merged

perf(gpu): cache compute pipelines across repeated solves#100
oritwoen merged 2 commits intomainfrom
autoresearch/puzzle-speed/02-pipeline-cache

Conversation

@oritwoen
Copy link
Copy Markdown
Owner

Pipeline was recompiled from scratch on every solve - shader compilation, bind group layout, the whole thing. Now cached in a static OnceLock<Mutex<HashMap>> keyed by device pointer and workgroup variant. Repeated solves skip straight to dispatch.

@oritwoen oritwoen self-assigned this Mar 31, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Added a process-wide pipeline cache in KangarooPipeline::new using OnceLock<Mutex<HashMap<...>>>, keyed by the device pointer address and WorkgroupVariant. KangarooPipeline::new looks up and returns a cloned cached pipeline if present; otherwise it creates the shader/module/layout/pipeline, inserts it into the cache, and returns it. WorkgroupVariant now derives Hash.

Changes

Cohort / File(s) Summary
Pipeline Caching
src/gpu/pipeline.rs
Introduced a global PIPELINE_CACHE (OnceLock<Mutex<HashMap<(usize, WorkgroupVariant), KangarooPipeline>>>). Cache key uses Arc::as_ptr(&ctx.device) as usizeandWorkgroupVariant(nowHash). KangarooPipeline::new` now returns cached pipelines when found and inserts newly created ones.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Key concern: cache key is the raw pointer address (Arc::as_ptr(&ctx.device) as usize). If a device Arc is dropped and a new device allocates at the same address, the cache may return a pipeline tied to a different (or freed) device causing use-after-free or invalid device references. Verify device lifetime guarantees or use a stronger key (e.g., store an Arc<Device> handle in the cache key/value) to ensure the cached pipeline cannot outlive its device.

Poem

Pipelines sleep in a keyed little hive,
Recalled by an address to jump back alive.
Cache hums softly, no rebuild parade,
Just watch those pointers — or debts are made. 🎛️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly identifies the performance optimization: caching compute pipelines to avoid recompilation on repeated solves, which is the main change in the PR.
Description check ✅ Passed Description directly relates to the changeset, explaining the problem (pipeline recompilation on every solve) and the solution (static cache keyed by device and variant).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch autoresearch/puzzle-speed/02-pipeline-cache
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch autoresearch/puzzle-speed/02-pipeline-cache

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added enhancement New feature or request performance Performance improvements labels Mar 31, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Confidence score: 4/5

  • This looks safe to merge overall; the main issue is a low-to-moderate severity concurrency inefficiency rather than a functional bug.
  • In src/gpu/pipeline.rs, concurrent calls to new can both miss the cache and compile the same pipeline, causing duplicate compile cost under contention.
  • Pay close attention to src/gpu/pipeline.rs - cache check/insert split across compilation can lead to duplicate work in concurrent scenarios.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/gpu/pipeline.rs">

<violation number="1" location="src/gpu/pipeline.rs:41">
P2: Cache check and insert are split by compilation, so concurrent calls can both miss and compile the same pipeline. That means you still pay the shader compile cost twice when two threads hit `new` at the same time.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant App as Solver / Client
    participant KP as KangarooPipeline
    participant Cache as PIPELINE_CACHE (Static)
    participant WGPU as wgpu / GPU Driver

    Note over App,WGPU: Request for Compute Pipeline (KangarooPipeline::new)

    App->>KP: new(ctx, variant)
    KP->>KP: Extract device pointer as key
    
    KP->>Cache: NEW: Lock and get(device_key, variant)
    
    alt NEW: Cache Hit (Pipeline exists)
        Cache-->>KP: Return cloned KangarooPipeline
        KP-->>App: Return cached pipeline (SKIP Compilation)
    else NEW: Cache Miss (First solve for device/variant)
        Cache-->>KP: None
        
        KP->>KP: Load shader sources (FIELD_WGSL, etc.)
        
        KP->>WGPU: create_bind_group_layout()
        WGPU-->>KP: BindGroupLayout
        
        KP->>WGPU: create_compute_pipeline()
        Note right of WGPU: Heavy operation: JIT Shader Compilation
        WGPU-->>KP: ComputePipeline
        
        KP->>Cache: NEW: Insert(device_key, variant, pipeline)
        KP-->>App: Return new pipeline
    end
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread src/gpu/pipeline.rs Outdated
Prevents concurrent threads from both missing the cache and compiling
the same shader twice.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/gpu/pipeline.rs">

<violation number="1" location="src/gpu/pipeline.rs:41">
P2: The cache lock is held during full pipeline compilation, so unrelated cache misses are serialized behind one global mutex.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread src/gpu/pipeline.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gpu/pipeline.rs`:
- Around line 35-37: The static PIPELINE_CACHE stores KangarooPipeline entries
containing Arc<ComputePipeline> and Arc<BindGroupLayout>, which prevents GPU
objects from ever being dropped across different GpuContext/device lifetimes;
change the cache to hold Weak references (e.g., Weak<ComputePipeline> and
Weak<BindGroupLayout>) and update cache access code in functions that
create/load KangarooPipeline to attempt .upgrade(), recreate and replace entries
when upgrade fails, and periodically purge expired entries, or alternatively
refactor the cache out of the static and make it device-scoped on the GpuContext
so pipelines are tied to the device instance.
- Around line 39-43: The cache keyting by raw pointer (device_key via
Arc::as_ptr) is unsafe across device drops; change PIPELINE_CACHE to key by a
Weak<wgpu::Device> (or store Weak in the map entries) and, on lookup in the
pipeline retrieval path (where device_key, PIPELINE_CACHE, guard.get, and
variant are used), iterate/clean the map: upgrade each Weak to Arc, drop dead
Weaks, and compare candidate devices with the current ctx.device using
Arc::ptr_eq before returning the cached pipeline; if no match, insert a new
entry storing Arc::downgrade(&ctx.device) alongside the pipeline so stale
pointer reuse cannot return resources from a different device.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d68a041-b094-41f8-9eb7-631598ecbea7

📥 Commits

Reviewing files that changed from the base of the PR and between 98457f7 and 5d1ac42.

📒 Files selected for processing (1)
  • src/gpu/pipeline.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: cubic · AI code reviewer
🧰 Additional context used
📓 Path-based instructions (1)
src/gpu/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Never add y_parity or symClass to GpuDistinguishedPoint; collision resolution must be done CPU-side via compute_candidate_keys()

Files:

  • src/gpu/pipeline.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/kangaroo_affine.wgsl : Implement main GPU compute logic in WGSL shaders located at src/shaders/kangaroo_affine.wgsl
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Handle GPU context setup including wgpu adapter and device creation in src/gpu_crypto/context.rs

Applied to files:

  • src/gpu/pipeline.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)

Applied to files:

  • src/gpu/pipeline.rs
📚 Learning: 2026-02-20T07:57:41.039Z
Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/*.wgsl : Use workgroup size of 64 threads in compute shaders (hardcoded)

Applied to files:

  • src/gpu/pipeline.rs

Comment thread src/gpu/pipeline.rs
Comment thread src/gpu/pipeline.rs
@oritwoen oritwoen merged commit 3347803 into main Mar 31, 2026
4 checks passed
@oritwoen oritwoen deleted the autoresearch/puzzle-speed/02-pipeline-cache branch March 31, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant