perf(gpu): cache compute pipelines across repeated solves by oritwoen · Pull Request #100 · oritwoen/kangaroo

oritwoen · 2026-03-31T15:33:40Z

Pipeline was recompiled from scratch on every solve - shader compilation, bind group layout, the whole thing. Now cached in a static OnceLock<Mutex<HashMap>> keyed by device pointer and workgroup variant. Repeated solves skip straight to dispatch.

coderabbitai · 2026-03-31T15:34:33Z

📝 Walkthrough

Walkthrough

Added a process-wide pipeline cache in KangarooPipeline::new using OnceLock<Mutex<HashMap<...>>>, keyed by the device pointer address and WorkgroupVariant. KangarooPipeline::new looks up and returns a cloned cached pipeline if present; otherwise it creates the shader/module/layout/pipeline, inserts it into the cache, and returns it. WorkgroupVariant now derives Hash.

Changes

Cohort / File(s)	Summary
Pipeline Caching `src/gpu/pipeline.rs`	Introduced a global `PIPELINE_CACHE` (OnceLock<Mutex<HashMap<(usize, WorkgroupVariant), KangarooPipeline>>>)`. Cache key uses` Arc::as_ptr(&ctx.device) as usize`and`WorkgroupVariant`(now`Hash`).` KangarooPipeline::new` now returns cached pipelines when found and inserts newly created ones.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Key concern: cache key is the raw pointer address (Arc::as_ptr(&ctx.device) as usize). If a device Arc is dropped and a new device allocates at the same address, the cache may return a pipeline tied to a different (or freed) device causing use-after-free or invalid device references. Verify device lifetime guarantees or use a stronger key (e.g., store an Arc<Device> handle in the cache key/value) to ensure the cached pipeline cannot outlive its device.

Poem

Pipelines sleep in a keyed little hive,
Recalled by an address to jump back alive.
Cache hums softly, no rebuild parade,
Just watch those pointers — or debts are made. 🎛️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly identifies the performance optimization: caching compute pipelines to avoid recompilation on repeated solves, which is the main change in the PR.
Description check	✅ Passed	Description directly relates to the changeset, explaining the problem (pipeline recompilation on every solve) and the solution (static cache keyed by device and variant).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch autoresearch/puzzle-speed/02-pipeline-cache

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch autoresearch/puzzle-speed/02-pipeline-cache

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

1 issue found across 1 file

Confidence score: 4/5

This looks safe to merge overall; the main issue is a low-to-moderate severity concurrency inefficiency rather than a functional bug.
In src/gpu/pipeline.rs, concurrent calls to new can both miss the cache and compile the same pipeline, causing duplicate compile cost under contention.
Pay close attention to src/gpu/pipeline.rs - cache check/insert split across compilation can lead to duplicate work in concurrent scenarios.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/gpu/pipeline.rs">

<violation number="1" location="src/gpu/pipeline.rs:41">
P2: Cache check and insert are split by compilation, so concurrent calls can both miss and compile the same pipeline. That means you still pay the shader compile cost twice when two threads hit `new` at the same time.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant App as Solver / Client
    participant KP as KangarooPipeline
    participant Cache as PIPELINE_CACHE (Static)
    participant WGPU as wgpu / GPU Driver

    Note over App,WGPU: Request for Compute Pipeline (KangarooPipeline::new)

    App->>KP: new(ctx, variant)
    KP->>KP: Extract device pointer as key
    
    KP->>Cache: NEW: Lock and get(device_key, variant)
    
    alt NEW: Cache Hit (Pipeline exists)
        Cache-->>KP: Return cloned KangarooPipeline
        KP-->>App: Return cached pipeline (SKIP Compilation)
    else NEW: Cache Miss (First solve for device/variant)
        Cache-->>KP: None
        
        KP->>KP: Load shader sources (FIELD_WGSL, etc.)
        
        KP->>WGPU: create_bind_group_layout()
        WGPU-->>KP: BindGroupLayout
        
        KP->>WGPU: create_compute_pipeline()
        Note right of WGPU: Heavy operation: JIT Shader Compilation
        WGPU-->>KP: ComputePipeline
        
        KP->>Cache: NEW: Insert(device_key, variant, pipeline)
        KP-->>App: Return new pipeline
    end

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

Prevents concurrent threads from both missing the cache and compiling the same shader twice.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/gpu/pipeline.rs">

<violation number="1" location="src/gpu/pipeline.rs:41">
P2: The cache lock is held during full pipeline compilation, so unrelated cache misses are serialized behind one global mutex.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gpu/pipeline.rs`:
- Around line 35-37: The static PIPELINE_CACHE stores KangarooPipeline entries
containing Arc<ComputePipeline> and Arc<BindGroupLayout>, which prevents GPU
objects from ever being dropped across different GpuContext/device lifetimes;
change the cache to hold Weak references (e.g., Weak<ComputePipeline> and
Weak<BindGroupLayout>) and update cache access code in functions that
create/load KangarooPipeline to attempt .upgrade(), recreate and replace entries
when upgrade fails, and periodically purge expired entries, or alternatively
refactor the cache out of the static and make it device-scoped on the GpuContext
so pipelines are tied to the device instance.
- Around line 39-43: The cache keyting by raw pointer (device_key via
Arc::as_ptr) is unsafe across device drops; change PIPELINE_CACHE to key by a
Weak<wgpu::Device> (or store Weak in the map entries) and, on lookup in the
pipeline retrieval path (where device_key, PIPELINE_CACHE, guard.get, and
variant are used), iterate/clean the map: upgrade each Weak to Arc, drop dead
Weaks, and compare candidate devices with the current ctx.device using
Arc::ptr_eq before returning the cached pipeline; if no match, insert a new
entry storing Arc::downgrade(&ctx.device) alongside the pipeline so stale
pointer reuse cannot return resources from a different device.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d68a041-b094-41f8-9eb7-631598ecbea7

📥 Commits

Reviewing files that changed from the base of the PR and between 98457f7 and 5d1ac42.

📒 Files selected for processing (1)

src/gpu/pipeline.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: cubic · AI code reviewer

🧰 Additional context used

📓 Path-based instructions (1)

src/gpu/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Never add y_parity or symClass to GpuDistinguishedPoint; collision resolution must be done CPU-side via compute_candidate_keys()

Files:

src/gpu/pipeline.rs

🧠 Learnings (4)

📓 Common learnings

Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/kangaroo_affine.wgsl : Implement main GPU compute logic in WGSL shaders located at src/shaders/kangaroo_affine.wgsl

📚 Learning: 2026-02-20T07:57:41.039Z

Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Handle GPU context setup including wgpu adapter and device creation in src/gpu_crypto/context.rs

Applied to files:

src/gpu/pipeline.rs

📚 Learning: 2026-02-20T07:57:41.039Z

Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)

Applied to files:

src/gpu/pipeline.rs

📚 Learning: 2026-02-20T07:57:41.039Z

Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/shaders/*.wgsl : Use workgroup size of 64 threads in compute shaders (hardcoded)

Applied to files:

src/gpu/pipeline.rs

Cache compute pipelines across repeated solves

98457f7

oritwoen self-assigned this Mar 31, 2026

coderabbitai Bot added enhancement New feature or request performance Performance improvements labels Mar 31, 2026

cubic-dev-ai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread src/gpu/pipeline.rs Outdated

coderabbitai Bot approved these changes Mar 31, 2026

View reviewed changes

fix(gpu): hold pipeline cache lock across compilation

5d1ac42

Prevents concurrent threads from both missing the cache and compiling the same shader twice.

cubic-dev-ai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread src/gpu/pipeline.rs

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

Comment thread src/gpu/pipeline.rs

Comment thread src/gpu/pipeline.rs

coderabbitai Bot approved these changes Mar 31, 2026

View reviewed changes

oritwoen merged commit 3347803 into main Mar 31, 2026
4 checks passed

oritwoen deleted the autoresearch/puzzle-speed/02-pipeline-cache branch March 31, 2026 17:01

oritwoen mentioned this pull request Mar 31, 2026

perf(gpu): narrow pipeline cache lock scope during compilation #101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(gpu): cache compute pipelines across repeated solves#100

perf(gpu): cache compute pipelines across repeated solves#100
oritwoen merged 2 commits intomainfrom
autoresearch/puzzle-speed/02-pipeline-cache

oritwoen commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oritwoen commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading