Skip to content

Commit bf397c0

Browse files
committed
Fix CLI Ctrl+C/SIGTERM shutdown: responsive cancellation and double-signal bug (#17588)
* Fix CLI Ctrl+C shutdown taking too long during AppHost startup - Make ConsoleCancellationManager.Cancel() non-blocking so Ctrl+C handler returns immediately - Pass cancellation token through to WaitAsync in CancelAppHostStartupAsync so Ctrl+C exits promptly instead of waiting for the full 5s timeout - Support second Ctrl+C for immediate force exit (Environment.Exit) - Add logging support to ConsoleCancellationManager via SetLogger() - Add comprehensive unit tests for ConsoleCancellationManager - Add integration test for RunCommand cancellation during startup timeout Fixes #17569 * Fix review comments: volatile logger, accurate comments, clearer warning * Clean up * Use WaitForSuccessPromptFailFastAsync in CLI E2E tests Replace WaitForSuccessPromptAsync with WaitForSuccessPromptFailFastAsync across all CLI E2E tests. The fail-fast variant detects error prompts immediately and throws instead of hanging for up to 500s waiting for a success prompt that will never arrive. This prevents 10+ minute CI timeouts when a command fails with a non-zero exit code. * Fix double-signal bug and consolidate test helpers - Fix ConsoleCancellationManager double-signal bug: move Console.CancelKeyPress to else branch so it only registers on platforms without PosixSignalRegistration. Previously both handlers fired for the same SIGINT, causing immediate force-kill. - Add SIGQUIT/Ctrl+Break registration for Windows parity. - Remove old WaitForSuccessPromptAsync (no fail-fast) and rename WaitForSuccessPromptFailFastAsync to WaitForSuccessPromptAsync. - Remove duplicate RunCommandFailFastAsync (identical to RunCommandAsync). * Fix stale comment and restrict SIGQUIT to Windows only * Remove unused legacy builder methods and update E2E skill docs * Don't force when debugging * More logging
1 parent 32e1d8e commit bf397c0

32 files changed

Lines changed: 394 additions & 227 deletions

.github/skills/cli-e2e-testing/SKILL.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -270,10 +270,10 @@ await auto.WaitUntilAsync(
270270

271271
| Method | Description |
272272
|--------|-------------|
273-
| `WaitForSuccessPromptAsync(counter, timeout?)` | Waits for `[N OK] $ ` prompt and increments counter |
273+
| `WaitForSuccessPromptAsync(counter, timeout?)` | Waits for `[N OK] $ ` prompt, fails immediately if error prompt appears, and increments counter |
274274
| `WaitForAnyPromptAsync(counter, timeout?)` | Waits for any prompt (`OK` or `ERR`) and increments counter |
275275
| `WaitForErrorPromptAsync(counter, timeout?)` | Waits for `[N ERR:code] $ ` prompt and increments counter |
276-
| `WaitForSuccessPromptFailFastAsync(counter, timeout?)` | Waits for success prompt, fails immediately if error prompt appears |
276+
| `RunCommandAsync(command, counter, timeout?)` | Types a command, presses Enter, and waits for success prompt (fails fast on error) |
277277
| `DeclineAgentInitPromptAsync()` | Declines the `aspire agent init` prompt if it appears |
278278
| `AspireNewAsync(projectName, counter, template?, useRedisCache?)` | Runs `aspire new` interactively, handling template selection, project name, output path, URLs, Redis, and test project prompts |
279279

@@ -301,8 +301,7 @@ The following extensions on `Hex1bTerminalInputSequenceBuilder` are still availa
301301
|--------|-------------|
302302
| `WaitForSuccessPrompt(counter, timeout?)` | *(legacy)* Waits for `[N OK] $ ` prompt and increments counter |
303303
| `PrepareEnvironment(workspace, counter)` | *(legacy)* Sets up custom prompt with command tracking |
304-
| `InstallAspireCliFromPullRequest(prNumber, counter)` | *(legacy)* Downloads and installs CLI from PR artifacts |
305-
| `SourceAspireCliEnvironment(counter)` | *(legacy)* Adds `~/.aspire/bin` to PATH |
304+
| `SourceAspireBundleEnvironment(counter)` | *(legacy)* Sources bundle PATH environment variables |
306305

307306
## DO: Use CellPatternSearcher for Output Detection
308307

src/Aspire.Cli/Commands/RunCommand.cs

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ protected override async Task<CommandResult> ExecuteAsync(ParseResult parseResul
306306
catch (TimeoutException)
307307
{
308308
runActivity?.SetTag(TelemetryConstants.Tags.ErrorType, "startup_timeout");
309-
await CancelAppHostStartupAsync(runCancellationTokenSource, pendingRun).ConfigureAwait(false);
309+
await CancelAppHostStartupAsync(runCancellationTokenSource, pendingRun, cancellationToken).ConfigureAwait(false);
310310
return CreateStartupTimeoutResult(timeoutSeconds);
311311
}
312312

@@ -351,7 +351,7 @@ protected override async Task<CommandResult> ExecuteAsync(ParseResult parseResul
351351
catch (TimeoutException)
352352
{
353353
runActivity?.SetTag(TelemetryConstants.Tags.ErrorType, "startup_timeout");
354-
await CancelAppHostStartupAsync(runCancellationTokenSource, pendingRun).ConfigureAwait(false);
354+
await CancelAppHostStartupAsync(runCancellationTokenSource, pendingRun, cancellationToken).ConfigureAwait(false);
355355
return CreateStartupTimeoutResult(timeoutSeconds);
356356
}
357357

@@ -1124,15 +1124,18 @@ private TimeSpan GetRemainingStartupTimeout(long startupStartTimestamp, TimeSpan
11241124
return elapsed >= startupTimeout ? TimeSpan.Zero : startupTimeout - elapsed;
11251125
}
11261126

1127-
private async Task CancelAppHostStartupAsync(CancellationTokenSource runCancellationTokenSource, Task<int> pendingRun)
1127+
private async Task CancelAppHostStartupAsync(CancellationTokenSource runCancellationTokenSource, Task<int> pendingRun, CancellationToken cancellationToken)
11281128
{
11291129
runCancellationTokenSource.Cancel();
11301130

11311131
try
11321132
{
1133-
await pendingRun.WaitAsync(s_appHostStartupCancellationTimeout, _timeProvider).ConfigureAwait(false);
1133+
// The timeout is a safety net for the startup-timeout path (no Ctrl+C). When the user
1134+
// presses Ctrl+C, cancellationToken fires and WaitAsync exits immediately via the token
1135+
// rather than waiting for the full timeout duration.
1136+
await pendingRun.WaitAsync(s_appHostStartupCancellationTimeout, _timeProvider, cancellationToken).ConfigureAwait(false);
11341137
}
1135-
catch (OperationCanceledException) when (runCancellationTokenSource.IsCancellationRequested)
1138+
catch (OperationCanceledException) when (runCancellationTokenSource.IsCancellationRequested || cancellationToken.IsCancellationRequested)
11361139
{
11371140
}
11381141
catch (TimeoutException ex)

src/Aspire.Cli/ConsoleCancellationManager.cs

Lines changed: 92 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,35 @@
11
// Licensed to the .NET Foundation under one or more agreements.
22
// The .NET Foundation licenses this file to you under the MIT license.
33

4+
using System.Diagnostics;
45
using System.Runtime.InteropServices;
6+
using Microsoft.Extensions.Logging;
7+
using Microsoft.Extensions.Logging.Abstractions;
58

69
namespace Aspire.Cli;
710

811
/// <summary>
912
/// Manages Ctrl+C, SIGINT, and SIGTERM signal handling with a shared CancellationTokenSource.
10-
/// After cancellation is requested, waits up to <c>processTerminationTimeout</c> for the running
11-
/// handler to complete before signaling forced termination via <see cref="ProcessTerminationCompletionSource"/>.
13+
/// After cancellation is requested, schedules an asynchronous timeout for the running handler
14+
/// to complete before signaling forced termination via <see cref="ProcessTerminationCompletionSource"/>.
15+
/// A second signal forces immediate termination without waiting for the timeout.
1216
/// Disposing this instance unregisters all signal handlers and disposes the token source.
1317
/// </summary>
1418
internal sealed class ConsoleCancellationManager : IDisposable
1519
{
20+
// Standard Unix exit codes: 128 + signal number (SIGINT=2, SIGTERM=15).
21+
// SigIntExitCode (130): used when the user presses Ctrl+C (SIGINT) or Ctrl+Break/SIGQUIT.
22+
// SigTermExitCode (143): used when the process receives SIGTERM (e.g. container stop, ProcessExit).
1623
private const int SigIntExitCode = 130;
1724
private const int SigTermExitCode = 143;
1825

1926
private readonly CancellationTokenSource _cts = new();
2027
private readonly TimeSpan _processTerminationTimeout;
2128
private readonly PosixSignalRegistration? _sigIntRegistration;
2229
private readonly PosixSignalRegistration? _sigTermRegistration;
30+
private readonly PosixSignalRegistration? _sigQuitRegistration;
2331
private readonly CancellationToken _token;
32+
private ILogger _logger;
2433
private Task<int>? _startedHandler;
2534
private int _cancelCalled;
2635

@@ -38,9 +47,16 @@ internal sealed class ConsoleCancellationManager : IDisposable
3847
/// </summary>
3948
internal void SetStartedHandler(Task<int> handler) => Volatile.Write(ref _startedHandler, handler);
4049

50+
/// <summary>
51+
/// Sets the logger instance used for diagnostic messages during signal handling.
52+
/// Call this once the logging infrastructure is available.
53+
/// </summary>
54+
internal void SetLogger(ILogger logger) => Volatile.Write(ref _logger, logger);
55+
4156
public ConsoleCancellationManager(TimeSpan processTerminationTimeout)
4257
{
4358
_processTerminationTimeout = processTerminationTimeout;
59+
_logger = NullLogger.Instance;
4460

4561
// Set to a field so getting the token doesn't error after dispose.
4662
_token = _cts.Token;
@@ -56,9 +72,22 @@ public ConsoleCancellationManager(TimeSpan processTerminationTimeout)
5672
{
5773
_sigIntRegistration = PosixSignalRegistration.Create(PosixSignal.SIGINT, OnPosixSignal);
5874
_sigTermRegistration = PosixSignalRegistration.Create(PosixSignal.SIGTERM, OnPosixSignal);
75+
76+
// SIGQUIT maps to CTRL_BREAK_EVENT on Windows. Register it to maintain parity with
77+
// Console.CancelKeyPress which handled both Ctrl+C and Ctrl+Break.
78+
// On Linux/macOS, SIGQUIT's default action produces a core dump which is useful for
79+
// debugging hung processes — don't intercept it there.
80+
if (OperatingSystem.IsWindows())
81+
{
82+
_sigQuitRegistration = PosixSignalRegistration.Create(PosixSignal.SIGQUIT, OnPosixSignal);
83+
}
84+
}
85+
else
86+
{
87+
// Fall back to Console.CancelKeyPress on platforms that don't support PosixSignalRegistration.
88+
Console.CancelKeyPress += OnCancelKeyPress;
5989
}
6090

61-
Console.CancelKeyPress += OnCancelKeyPress;
6291
AppDomain.CurrentDomain.ProcessExit += OnProcessExit;
6392
}
6493

@@ -69,7 +98,13 @@ public ConsoleCancellationManager(TimeSpan processTerminationTimeout)
6998
private void OnPosixSignal(PosixSignalContext context)
7099
{
71100
context.Cancel = true;
72-
Cancel(context.Signal == PosixSignal.SIGINT ? SigIntExitCode : SigTermExitCode);
101+
var exitCode = context.Signal switch
102+
{
103+
PosixSignal.SIGINT => SigIntExitCode,
104+
PosixSignal.SIGQUIT => SigIntExitCode,
105+
_ => SigTermExitCode
106+
};
107+
Cancel(exitCode);
73108
}
74109

75110
private void OnCancelKeyPress(object? sender, ConsoleCancelEventArgs e)
@@ -80,47 +115,81 @@ private void OnCancelKeyPress(object? sender, ConsoleCancelEventArgs e)
80115

81116
private void OnProcessExit(object? sender, EventArgs e) => Cancel(SigTermExitCode);
82117

83-
private void Cancel(int forcedTerminationExitCode)
118+
internal void Cancel(int forcedTerminationExitCode)
84119
{
85-
// Ensure only the first signal triggers cancellation logic; subsequent signals are no-ops.
86-
if (Interlocked.CompareExchange(ref _cancelCalled, 1, 0) != 0)
87-
{
88-
return;
89-
}
120+
var signalCount = Interlocked.Increment(ref _cancelCalled);
90121

91-
// Request cancellation so cooperative listeners can begin shutting down.
92-
try
122+
if (signalCount == 1)
93123
{
94-
_cts.Cancel();
124+
// First signal: request cooperative cancellation and schedule an async timeout
125+
// that will force-terminate if the handler doesn't complete in time.
126+
_logger.LogInformation("Termination signal received, requesting cancellation.");
127+
128+
try
129+
{
130+
_cts.Cancel();
131+
}
132+
catch (ObjectDisposedException)
133+
{
134+
// A signal can race with process shutdown after cancellation resources are disposed.
135+
return;
136+
}
137+
138+
// Schedule the forced-completion timeout asynchronously so the signal handler
139+
// returns immediately. This allows Program.Main's Task.WhenAny to observe
140+
// handlerTask completion without being blocked by the signal handler thread.
141+
_ = ForceTerminationAfterTimeoutAsync(forcedTerminationExitCode);
95142
}
96-
catch (ObjectDisposedException)
143+
else
97144
{
98-
// A signal can race with process shutdown after cancellation resources are disposed.
99-
return;
145+
// Second (or subsequent) signal: force immediate termination without waiting.
146+
_logger.LogWarning("Second termination signal received, forcing immediate exit.");
147+
_processTerminationCompletionSource.TrySetResult(forcedTerminationExitCode);
100148
}
149+
}
101150

151+
private async Task ForceTerminationAfterTimeoutAsync(int forcedTerminationExitCode)
152+
{
102153
try
103154
{
155+
// When a debugger is attached, don't force-terminate — the developer needs
156+
// unlimited time to step through cancellation/cleanup logic.
157+
if (Debugger.IsAttached)
158+
{
159+
return;
160+
}
161+
104162
var startedHandler = Volatile.Read(ref _startedHandler);
105163

106-
// Wait for the configured interval to allow graceful shutdown.
107-
if (startedHandler is null || !startedHandler.Wait(_processTerminationTimeout))
164+
if (startedHandler is not null)
108165
{
109-
// If the handler does not finish within configured time, use the completion
110-
// source to signal forced completion (preserving native exit code).
111-
_processTerminationCompletionSource.TrySetResult(forcedTerminationExitCode);
166+
// Give the handler a chance to finish gracefully within the configured timeout.
167+
// Task.WhenAny completes when either the handler or the delay finishes first,
168+
// without propagating exceptions from the losing task.
169+
// It's ok that this delay isn't cancellable. The process is ending.
170+
var timeoutTask = Task.Delay(_processTerminationTimeout);
171+
if (await Task.WhenAny(startedHandler, timeoutTask).ConfigureAwait(false) == startedHandler)
172+
{
173+
// Handler finished within the timeout; no forced termination needed.
174+
return;
175+
}
112176
}
177+
178+
_logger.LogWarning("Handler did not complete within {Timeout}s, forcing termination.", _processTerminationTimeout.TotalSeconds);
179+
_processTerminationCompletionSource.TrySetResult(forcedTerminationExitCode);
113180
}
114-
catch (AggregateException)
181+
catch (Exception)
115182
{
116-
// The task was cancelled or an exception was thrown during task execution.
183+
// Any failure in the timeout path should still force termination rather than hang.
184+
_processTerminationCompletionSource.TrySetResult(forcedTerminationExitCode);
117185
}
118186
}
119187

120188
public void Dispose()
121189
{
122190
_sigIntRegistration?.Dispose();
123191
_sigTermRegistration?.Dispose();
192+
_sigQuitRegistration?.Dispose();
124193

125194
Console.CancelKeyPress -= OnCancelKeyPress;
126195
AppDomain.CurrentDomain.ProcessExit -= OnProcessExit;

src/Aspire.Cli/Program.cs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,7 @@ public static async Task<int> Main(string[] args)
801801
var logBufferContext = new ConsoleLogBufferContext();
802802
var (loggerFactory, fileLoggerProvider) = CreateLoggerFactory(args, loggingOptions, errorWriter, logBufferContext);
803803
var logger = loggerFactory.CreateLogger(RootLoggerName);
804+
cancellationManager.SetLogger(logger);
804805
using var startupContext = new CliStartupContext(loggingOptions, errorWriter, loggerFactory, fileLoggerProvider, logBufferContext, logger);
805806

806807
logger.LogInformation("Aspire CLI version: {Version}", AspireCliTelemetry.GetCliVersion());
@@ -899,9 +900,10 @@ public static async Task<int> Main(string[] args)
899900
var firstCompletedTask = await Task.WhenAny(handlerTask, cancellationManager.ProcessTerminationCompletionSource.Task);
900901
if (firstCompletedTask != handlerTask)
901902
{
902-
// The termination signal triggered cancellation and the timeout has completed. Kill the process.
903+
// ProcessTerminationCompletionSource was signaled — either the graceful-shutdown
904+
// timeout elapsed, or a second signal forced immediate termination.
903905
// handlerTask is not awaited because the process is shutting down and we assume the task is hung.
904-
logger.LogWarning("Timeout waiting for cancellation from termination signal.");
906+
logger.LogWarning("Termination signal forced process exit.");
905907
}
906908

907909
exitCode = await firstCompletedTask; // return the result or propagate the exception

src/Aspire.Cli/Projects/ProcessGuestLauncher.cs

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ public ProcessGuestLauncher(string language, ILogger logger, FileLoggerProvider?
123123

124124
AddEvent(activity, ProfilingTelemetry.Events.GuestProcessStart);
125125
process.Start();
126+
_logger.LogDebug("{Language} guest process {ProcessId} started: {Command}", _language, process.Id, resolvedCommandPath);
126127
activity?.SetTag(TelemetryConstants.Tags.ProcessPid, process.Id);
127128
AddEvent(activity, ProfilingTelemetry.Events.GuestProcessStarted, TelemetryConstants.Tags.ProcessPid, process.Id);
128129
if (afterLaunchAsync is not null)
@@ -135,7 +136,12 @@ public ProcessGuestLauncher(string language, ILogger logger, FileLoggerProvider?
135136

136137
try
137138
{
138-
await process.WaitForExitAsync(cancellationToken);
139+
var waitForExitTask = process.WaitForExitAsync(cancellationToken);
140+
141+
using var _ = cancellationToken.Register(() =>
142+
_logger.LogInformation("Cancellation requested while waiting for {Language} guest process {ProcessId} to exit", _language, process.Id));
143+
144+
await waitForExitTask.ConfigureAwait(false);
139145
}
140146
catch (OperationCanceledException)
141147
{
@@ -152,6 +158,7 @@ public ProcessGuestLauncher(string language, ILogger logger, FileLoggerProvider?
152158
// the redirected output streams have time to drain.
153159
if (!process.HasExited)
154160
{
161+
_logger.LogInformation("Killing {Language} guest process tree {ProcessId}", _language, process.Id);
155162
try
156163
{
157164
process.Kill(entireProcessTree: true);
@@ -161,10 +168,17 @@ public ProcessGuestLauncher(string language, ILogger logger, FileLoggerProvider?
161168
_logger.LogDebug(killEx, "Failed to kill guest process {ProcessId} after cancellation", process.Id);
162169
}
163170
}
171+
else
172+
{
173+
_logger.LogDebug("{Language} guest process {ProcessId} already exited before kill", _language, process.Id);
174+
}
164175

176+
_logger.LogDebug("Waiting for {Language} guest process {ProcessId} to exit after kill", _language, process.Id);
165177
await process.WaitForExitAsync(CancellationToken.None).ConfigureAwait(false);
166178
}
167179

180+
_logger.LogDebug("{Language} guest process {ProcessId} exited with code {ExitCode}", _language, process.Id, process.ExitCode);
181+
168182
activity?.SetTag(TelemetryConstants.Tags.ProcessExitCode, process.ExitCode);
169183
AddEvent(activity, ProfilingTelemetry.Events.GuestProcessExited, TelemetryConstants.Tags.ProcessExitCode, process.ExitCode);
170184

tests/Aspire.Cli.EndToEnd.Tests/AgentCommandTests.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ public async Task AgentInitCommand_MigratesDeprecatedConfig()
122122
await auto.TypeAsync("aspire agent init --workspace-root . --skill-locations none --skills none");
123123
await auto.EnterAsync();
124124
await auto.WaitUntilTextAsync("configuration complete", timeout: TimeSpan.FromSeconds(30));
125-
await auto.WaitForSuccessPromptFailFastAsync(counter);
125+
await auto.WaitForSuccessPromptAsync(counter);
126126

127127
// Step 3: Verify config was updated to new format
128128
// The updated config should contain "agent" and "mcp" but not "start"
@@ -211,7 +211,7 @@ await auto.WaitUntilAsync(
211211
// the default Aspire skills from the seeded bundle.
212212
await auto.EnterAsync();
213213
await auto.WaitUntilTextAsync("configuration complete", timeout: TimeSpan.FromSeconds(30));
214-
await auto.WaitForSuccessPromptFailFastAsync(counter);
214+
await auto.WaitForSuccessPromptAsync(counter);
215215

216216
// Verify skill files were created (skills are now installed at .agents/skills/ by StandardLocationAgentEnvironmentScanner)
217217
var skillFilePath = Path.Combine(workspace.WorkspaceRoot.FullName, ".agents", "skills", "aspire", "SKILL.md");

tests/Aspire.Cli.EndToEnd.Tests/CSharpProjectModeInitTests.cs

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,9 @@ public async Task AspireInitWithSolutionFileGeneratesAppHostThatBuildsAgainstCha
104104
// `dotnet build` fails with `error MSB4236: The SDK 'Aspire.AppHost.Sdk/...'
105105
// could not be found.` 3 minutes is enough headroom for a cold restore + build on
106106
// CI; the cache-hit case (the template's `restore` post-action already populated
107-
// ~/.nuget/packages during init) finishes well under 30 seconds. Using the
108-
// fail-fast helper so a build failure surfaces immediately via the shell's
109-
// numbered ERR prompt instead of timing out.
110-
await auto.RunCommandFailFastAsync(
107+
// ~/.nuget/packages during init) finishes well under 30 seconds. A build failure
108+
// surfaces immediately via the shell's ERR prompt instead of timing out.
109+
await auto.RunCommandAsync(
111110
"dotnet build Test.AppHost/Test.AppHost.csproj",
112111
counter,
113112
TimeSpan.FromMinutes(3));

0 commit comments

Comments
 (0)