perf(gpu): narrow pipeline cache lock scope during compilation by oritwoen · Pull Request #101 · oritwoen/kangaroo

oritwoen · 2026-03-31T19:03:07Z

Summary

Drop the mutex guard immediately after cache lookup instead of holding it across the entire shader compilation
Re-acquire the lock only briefly for the final insert
Other threads can now read cached pipelines while a new variant is being compiled

Follow-up to #100.

Drop the mutex guard after cache lookup so shader compilation does not block other threads from reading cached pipelines. The lock is only re-acquired briefly for the final insert.

coderabbitai · 2026-03-31T19:03:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 644557f4-0287-4bb1-869f-4f37525f929c

📥 Commits

Reviewing files that changed from the base of the PR and between 3347803 and 6b2c711.

📒 Files selected for processing (1)

src/gpu/pipeline.rs

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: cubic · AI code reviewer

🧰 Additional context used

📓 Path-based instructions (1)

src/gpu/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Never add y_parity or symClass to GpuDistinguishedPoint; collision resolution must be done CPU-side via compute_candidate_keys()

Files:

src/gpu/pipeline.rs

🧠 Learnings (3)

📓 Common learnings

Learnt from: oritwoen
Repo: oritwoen/kangaroo PR: 100
File: src/gpu/pipeline.rs:35-37
Timestamp: 2026-03-31T16:28:59.172Z
Learning: The `kangaroo` project is a CLI solver, not a service. `GpuContext` is created once per process run, and the process exits after solving. There is no create/destroy cycle for GPU contexts, so a static `PIPELINE_CACHE` holding strong `Arc` references in `src/gpu/pipeline.rs` is intentional and safe for this use case.

📚 Learning: 2026-03-31T16:28:59.172Z

Learnt from: oritwoen
Repo: oritwoen/kangaroo PR: 100
File: src/gpu/pipeline.rs:35-37
Timestamp: 2026-03-31T16:28:59.172Z
Learning: The `kangaroo` project is a CLI solver, not a service. `GpuContext` is created once per process run, and the process exits after solving. There is no create/destroy cycle for GPU contexts, so a static `PIPELINE_CACHE` holding strong `Arc` references in `src/gpu/pipeline.rs` is intentional and safe for this use case.

Applied to files:

src/gpu/pipeline.rs

📚 Learning: 2026-02-20T07:57:41.039Z

Learnt from: CR
Repo: oritwoen/kangaroo PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T07:57:41.039Z
Learning: Applies to src/gpu_crypto/context.rs : Detect and warn about software renderer fallbacks (llvmpipe/SwiftShader)

Applied to files:

src/gpu/pipeline.rs

🔇 Additional comments (2)

src/gpu/pipeline.rs (2)

41-48: TOCTOU race: two threads can both miss cache and compile the same variant.

Between the lookup dropping the lock (line 45) and the insert acquiring it again (line 174), another thread can also miss the cache for the same key. Both threads then compile the same shader, and the second insert wins.

For a CLI solver (per the learnings) this probably only causes redundant work during init, which is fine. If that's intentional, a brief comment would help future readers understand the trade-off. Otherwise, if you want true single-compilation guarantee, you'd need something like an entry API with or_insert_with while holding the lock.

Just confirming this is the intended behavior - redundant compilation is acceptable for the perf win on the read path?

Also applies to: 173-176

41-48: Lock-and-release pattern looks good.

The inline cache.lock()...get()...cloned() drops the guard immediately after cloning, which is the right move for narrowing scope. Clean pattern.

📝 Walkthrough

Walkthrough

Refactored cache locking in KangarooPipeline::new to drop lock guards immediately after use rather than holding them across control flow. The read path now acquires and releases the lock inline; the write path does the same for insertion. Logic flow unchanged.

Changes

Cohort / File(s)	Summary
Pipeline cache lock refactoring `src/gpu/pipeline.rs`	Replaced persistent `guard` variable with chained temporary lock acquisitions. Read path (`lock()` → `get()` → `cloned()`) and write path (`lock()` → `insert(...)`) now release locks immediately after operations instead of spanning conditional branches.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The main question: does dropping the lock early between the read-miss and the write introduce a race where two threads could both miss cache and create duplicate pipelines? Need to verify the creation cost is acceptable or if atomic compare-and-swap on insert would be safer. Also confirm the clone cost on read doesn't outweigh the lock scope reduction.

Possibly related PRs

perf(gpu): cache compute pipelines across repeated solves #100: Modifies cache lookup/insert and locking behavior in the same method—direct overlap on lock acquisition patterns.

Suggested labels

performance, refactoring

Poem

🔒 Lock held tight? Nope, released quick,
Chains of ops, no guarding tricks,
Cache still works, contention less—
Just watch the race, mind the mess. 🏎️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: narrowing the pipeline cache lock scope during compilation by releasing the guard immediately after lookup instead of holding it across shader compilation.
Description check	✅ Passed	The description clearly relates to the changeset, explaining the specific optimization: dropping the mutex guard after cache lookup, re-acquiring briefly for insert, and enabling concurrent reads during compilation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/narrow-pipeline-cache-lock

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch perf/narrow-pipeline-cache-lock

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 1 file

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Requires human review: Modifies concurrency logic and locking strategy for a global cache. While intended for performance, changes to shared state management require human review.}

perf(gpu): narrow pipeline cache lock scope during compilation

6b2c711

Drop the mutex guard after cache lookup so shader compilation does not block other threads from reading cached pipelines. The lock is only re-acquired briefly for the final insert.

coderabbitai Bot added the performance Performance improvements label Mar 31, 2026

coderabbitai Bot approved these changes Mar 31, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Mar 31, 2026

View reviewed changes

oritwoen merged commit a62ddb5 into main Mar 31, 2026
4 checks passed

oritwoen deleted the perf/narrow-pipeline-cache-lock branch March 31, 2026 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(gpu): narrow pipeline cache lock scope during compilation#101

perf(gpu): narrow pipeline cache lock scope during compilation#101
oritwoen merged 1 commit intomainfrom
perf/narrow-pipeline-cache-lock

oritwoen commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oritwoen commented Mar 31, 2026

Summary

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading