-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
rt: improve spawn_blocking scalability with sharded queue #7757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
alex
wants to merge
2
commits into
tokio-rs:master
Choose a base branch
from
alex:claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
rt: improve spawn_blocking scalability with sharded queue #7757
alex
wants to merge
2
commits into
tokio-rs:master
from
alex:claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf
+357
−118
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9537dda to
016f6ca
Compare
Contributor
Author
|
(FreeBSD failures look unrelated.) |
ADD-SP
reviewed
Dec 4, 2025
martin-g
reviewed
Dec 4, 2025
f4416fb to
21ff5ce
Compare
Member
|
Please rebase to latest |
21ff5ce to
694fa6b
Compare
martin-g
reviewed
Dec 5, 2025
694fa6b to
edd5e10
Compare
ADD-SP
reviewed
Dec 6, 2025
The blocking pool's task queue was protected by a single mutex, causing severe contention when many threads spawn blocking tasks concurrently. This resulted in nearly linear degradation: 16 concurrent threads took ~18x longer than a single thread. Replace the single-mutex queue with a sharded queue that distributes tasks across 16 lock-protected shards. The implementation adapts to concurrency levels by using fewer shards when thread count is low, maintaining cache locality while avoiding contention at scale. Benchmark results (spawning 100 batches of 16 tasks per thread): | Concurrency | Before | After | Improvement | |-------------|----------|---------|-------------| | 1 thread | 13.3ms | 17.8ms | +34% | | 2 threads | 26.0ms | 20.1ms | -23% | | 4 threads | 45.4ms | 27.5ms | -39% | | 8 threads | 111.5ms | 20.3ms | -82% | | 16 threads | 247.8ms | 22.4ms | -91% | The slight overhead at 1 thread is due to the sharded infrastructure, but this is acceptable given the dramatic improvement at higher concurrency where the original design suffered from lock contention.
edd5e10 to
126cb78
Compare
ADD-SP
reviewed
Dec 6, 2025
ADD-SP
reviewed
Dec 6, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-tokio
Area: The main tokio crate
M-blocking
Module: tokio/task/blocking
T-performance
Topic: performance and benchmarks
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The blocking pool's task queue was protected by a single mutex, causing severe contention when many threads spawn blocking tasks concurrently. This resulted in nearly linear degradation: 16 concurrent threads took ~18x longer than a single thread.
Replace the single-mutex queue with a sharded queue that distributes tasks across 16 lock-protected shards. The implementation adapts to concurrency levels by using fewer shards when thread count is low, maintaining cache locality while avoiding contention at scale.
Benchmark results (spawning 100 batches of 16 tasks per thread):
The slight overhead at 1 thread is due to the sharded infrastructure, but this is acceptable given the dramatic improvement at higher concurrency where the original design suffered from lock contention.
(Notwithstanding that this shows as a commit from claude, every line is human reviewed. If there's a mistake, it's Alex's fault.)