Redesign IO threading communication model #2909
Open
+1,627
−447
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR redesigns the IO threading communication model, replacing the inefficient client-list polling approach with a high-performance, lock-free queue architecture. This change improves throughput by 8–17% across various workloads and lays the groundwork for offloading command execution to IO threads in following PRs.
Performance Comparison: Unstable vs New IO Queues
Motivation
The previous IO model had several limitations that created performance bottlenecks:
The Solution
To address these inefficiencies, this PR replaces the single SPSC queue used currently with three specialized queues to handle communication and load balancing more effectively.
1. Main > IO: Shared Queue (Single Producer Multi Consumer)
Single queue from the main-thread to IO threads.
2. IO > Main: The Response Channel (MPSC Queue)
We replaced the old polling loop with a response queue.
3. MAIN > IO (Thread-Specific): Private Inbox (SPSC Queue)
We kept the existing Single-Producer Single-Consumer (SPSC) queues for tasks that must happen on a specific thread (like freeing memory allocated by that thread). IO threads always check their private inbox before looking at the shared queue.
Changes Required
Async client release
The main thread no longer busy-waits for IO threads to finish with a client. Since the client must be popped from the multi-producer queue before it can be released, clients with pending IO are now marked for asynchronous closure.
eviction clients logic
Updated evictClients() to account for memory pending release (clients marked close_asap). freeClient() now returns a status code (1 for freed, 0 for async-close) to ensure the eviction loop does not over-evict by ignoring memory that is about to be reclaimed.
events-per-io-thread config
Replaced the
events-per-io-threadconfiguration withio-threads-always-active. as we no longer track events, since this config is use only for tests no backward compatibility issue arises.packed job instead of handlers
Jobs are now represented as tagged pointers (using lower 3 bits for job type) instead of separate
{handler, data}structs. This reduces memory overhead and allows jobs to be passed through the queues as single pointers.head caching in spsc queue
The SPSC queue now caches the
headindex on the producer side (head_cache) to avoid frequent atomic loads. The producer only refreshes from the atomicheadwhen the cache indicates the queue might be full, reducing cross-thread cache-line bouncing.deferred commit in SPSC queue.
spscEnqueue()supports batching via acommitflag. Multiple jobs can be enqueued withcommit=false, then flushed with a singlespscCommit()call, reducing atomic operations and cache-line bouncing.rollback on fullness check failure
When
spmcEnqueue()fails due to a full queue, the client state is rolled back (e.g.,io_write_statereset toCLIENT_IDLE). This rollback approach removes the need to call an expensiveisFullcheck before every enqueue, we just attempt the enqueue and revert if it fails.epoll offloading via SPSC at high thread counts.
When
active_io_threads_num > 9, poll jobs are sent to per-thread SPSC queues (round-robin). Since threads check their private queue first, this ensures poll jobs are processed promptly without waiting behind jobs in the shared SPMC queue.avoid offload write before read comes back
Added a check
if (c->io_read_state == CLIENT_PENDING_IO) return C_OKintrySendWriteToIOThreads(). In the previous per-thread SPSC implementation, we could send consecutive read and write jobs for the same client knowing a single thread would handle them in order. With the shared SPMC queue, different threads may pick up the jobs, so we must wait for the read to complete before sending a write to avoid 2 threads handling the same client.removing pending_read_list_node from client and clients_pending_io_read/write lists from server
Removed
pending_read_list_nodefrom theclientstruct andclients_pending_io_read/clients_pending_io_writelists fromvalkeyServer. as the new mpsc eliminates the need for these tracking structures.added inst metrics for pending io jobs
Added
instantaneous_io_pending_jobsmetric viaSTATS_METRIC_IO_WAITto track average queue depth over time.added stat for current active threads number
Added
active_io_threads_numto the INFO stats output for better visibility.added internal inst metric for main-thread cpu (non apple compliant)
Added
STATS_METRIC_MAIN_THREAD_CPU_SYSto track main thread CPU usage viagetrusage(RUSAGE_THREAD). This powers the "ignition" policy, when CPU exceeds 30%, the first IO thread is activated.RUSAGE_THREADis Linux-specific, so macOS falls back to event-count heuristics.added stat for pending read and writes for io
Added
io_threaded_reads_pendingandio_threaded_writes_pendingstats to track how many read/write jobs are currently in-flight to IO threads.added volatile for crashed
Changed
server.crashedfrominttovolatile intto ensure the crash flag is visible across threads immediately, allowing IO threads to detect a crash and stop sending responses back to the main thread to avoid deadlock on crash.Co-authored-by: Dan Touitou [email protected]
Signed-off-by: Uri Yagelnik [email protected]