Fix race condition in Lambda+LocalServer causing NIOAsyncWriter fatal error (Bug #635)#636
Fix race condition in Lambda+LocalServer causing NIOAsyncWriter fatal error (Bug #635)#636
Conversation
… error This fixes a known issue with NIO's async server channel API where cancellation can cause accepted connections to be dropped before being read from the async stream, resulting in NIOAsyncWriter being deallocated without finish() being called. The fix replaces the async bind() API with the traditional callback-based childChannelInitializer, handling each connection immediately in a Task.detached to avoid the async stream cancellation race. Fixes #635
There was a problem hiding this comment.
Pull request overview
This PR addresses a race condition in the Lambda local server that causes a fatal error: "Deinited NIOAsyncWriter without calling finish()". The fix switches from NIO's async bind() API to the traditional callback-based childChannelInitializer approach, eliminating the problematic async stream that could drop unread channels during cancellation.
Changes:
- Replaced async bind() with callback-based childChannelInitializer that spawns detached tasks
- Simplified server shutdown logic by removing the connection iteration loop
- Removed withTaskCancellationHandler wrapper from handleConnection since detached tasks handle cancellation differently
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Task.detached { | ||
| await server.handleConnection(channel: asyncChannel, logger: logger) | ||
| } |
There was a problem hiding this comment.
Using Task.detached means connection handling tasks are not tracked or awaited. When the server shuts down, there's no mechanism to wait for these detached tasks to complete. While this solves the NIOAsyncWriter race condition, it could lead to abrupt connection terminations during shutdown. Consider documenting this limitation or tracking these tasks in a collection if graceful connection shutdown is important for local testing scenarios.
… error (Bug #635) (#636) On fast machines, the local Lambda server crashes with: ``` Fatal error: Deinited NIOAsyncWriter without calling finish() ``` This occurs in `NIOAsyncChannelHandler.channelActive()` when child connection channels are created. ## Root Cause This is a known issue with NIO's async server channel API (see [swift-nio#2637](apple/swift-nio#2637)). **The fundamental problem:** 1. The async `bind()` API creates `NIOAsyncChannel` instances for incoming connections 2. These channels are yielded through an async stream to the server loop 3. When the serving task is cancelled (or completes), the async stream iteration stops 4. Any channels that were accepted but not yet read from the stream are dropped 5. These unread channels never have `executeThenClose()` called on them 6. Their `NIOAsyncWriter` is deallocated without `finish()` being called → fatal error **Why graceful shutdown doesn't help:** Even closing the server channel gracefully doesn't eliminate the race - there's a timing window where: - A connection is accepted and queued in the async stream - The server task is cancelled or completes - The queued channel is never read and gets dropped IMHO, this is an inherent limitation of the `async bind()` API when combined with task cancellation. ## Solution I stopped using the `async bind()` API entirely. Instead, I use the traditional callback-based `childChannelInitializer`: 1. Create `NIOAsyncChannel` directly in `childChannelInitializer` (synchronous context) 2. Immediately spawn a `Task.detached` to handle the connection 3. Each connection is handled independently, not through a cancellable async stream 4. Detached tasks are not affected by task group cancellation 5. Every channel has `executeThenClose()` called immediately, preventing the writer from being dropped This approach avoids the async stream entirely, eliminating the race condition. ## Changes - Replaced `async bind()` with traditional `childChannelInitializer` - Each connection spawns a `Task.detached` that immediately calls `executeThenClose()` - Removed the connection iteration loop (no longer needed) - Server task now simply waits for the channel to close - Simplified shutdown logic since there's no async stream to drain ## Trade-offs - Uses `Task.detached` (unstructured concurrency) to bridge NIO's event-loop world with Swift concurrency - This is necessary until NIO provides a new bootstrap API that properly handles cancellation - Each connection is handled independently rather than through structured concurrency ## Testing Tested on fast machines where the race condition was reliably reproducible. The crash no longer occurs. ## References - [swift-nio#2637](apple/swift-nio#2637) - Known issue with async server channels and cancellation - [Comment from NIO maintainer](apple/swift-nio#2637 (comment)) - Recommends avoiding cancellation or using callback-based API Fixes #635 --------- Co-authored-by: Sebastien Stormacq <stormacq@amazon.lu>
On fast machines, the local Lambda server crashes with:
This occurs in
NIOAsyncChannelHandler.channelActive()when child connection channels are created.Root Cause
This is a known issue with NIO's async server channel API (see swift-nio#2637).
The fundamental problem:
bind()API createsNIOAsyncChannelinstances for incoming connectionsexecuteThenClose()called on themNIOAsyncWriteris deallocated withoutfinish()being called → fatal errorWhy graceful shutdown doesn't help:
Even closing the server channel gracefully doesn't eliminate the race - there's a timing window where:
IMHO, this is an inherent limitation of the
async bind()API when combined with task cancellation.Solution
I stopped using the
async bind()API entirely. Instead, I use the traditional callback-basedchildChannelInitializer:NIOAsyncChanneldirectly inchildChannelInitializer(synchronous context)Task.detachedto handle the connectionexecuteThenClose()called immediately, preventing the writer from being droppedThis approach avoids the async stream entirely, eliminating the race condition.
Changes
async bind()with traditionalchildChannelInitializerTask.detachedthat immediately callsexecuteThenClose()Trade-offs
Task.detached(unstructured concurrency) to bridge NIO's event-loop world with Swift concurrencyTesting
Tested on fast machines where the race condition was reliably reproducible. The crash no longer occurs.
References
Fixes #635