Failing test
Aspire.Cli.EndToEnd.Tests.ResourceCommandTests.ResourceCommand_FailsWhenInteractionServiceIsRequired
Source: tests/Aspire.Cli.EndToEnd.Tests/ResourceCommandTests.cs
Failure signature
The test asserts the full happy-path of "trigger a resource command that requires IInteractionService → assert the expected error → aspire stop". The final step times out at the Hex1b automation level:
Step 72 of 72 failed — WaitUntilText(" stopped successfully.")
Timed out after 00:01:00 waiting for: text " stopped successfully." to appear
at CliE2EAutomatorHelpers.cs:84
The terminal recording shows that aspire stop does not silently hang — it actively prints a failure after spinning for ~60s:
⠳ Stopping apphost.cs...
❌ Failed to stop apphost.cs.
📄 See logs at /root/.aspire/logs/cli_<id>.log
🔍 See AppHost logs at /root/.aspire/logs/cli_<id>_detach-child_<id>.log
The post-step docker container ls in the same CI job shows the redis container the AppHost owns is still up at the moment of failure:
CONTAINER ID IMAGE STATUS NAMES
5aa9f3938387 redis:latest Up 7 seconds cache-xuhbbbzq
So ProcessShutdownService.StopProcessesAsync exhausted its graceful + force-kill + monitor window without DCP finishing the container teardown, returned false, and StopCommand emitted FailedToStopAppHost. The Hex1b WaitUntilText(" stopped successfully.") then times out as a downstream symptom.
Why this is a flake, not a deterministic failure
Observed on PR #17452, run 26425521402:
| Attempt |
ResourceCommandTests |
| 1 |
❌ fails (same shape as above) |
| 2 |
✅ passes |
| 3 |
❌ fails (same shape) |
PR #17452 only touches CLI init / channel / scaffolding code paths (InitCommand, PackageChannel, GuestAppHostProject, ProjectUpdater, ScaffoldingService); none of those are on the aspire stop / ProcessShutdownService / backchannel / DCP container shutdown path, so this is not a regression introduced by that PR — it surfaces a pre-existing intermittent shutdown-timing flake.
Suspected root cause
This test is the only ResourceCommandTests case that combines:
mountDockerSocket: true → runs a real redis container under DCP
- a deliberately-failing resource command (
IInteractionService unavailable) immediately before aspire stop
ProcessShutdownService.StopProcessesAsync (see src/Aspire.Cli/Processes/ProcessShutdownService.cs) currently uses:
s_processTerminationTimeout = 10s for the post-graceful-shutdown monitor
- followed by
ProcessSignaler.ForceKill on the AppHost process tree
- followed by another
MonitorProcessesForTerminationAsync pass
Under Docker-in-Docker contention on the GitHub ubuntu-latest runner, DCP's container teardown can outlast that budget, so the AppHost process does not exit and the CLI prints FailedToStopAppHost.
The companion test ResourceCommand_FailedExecution_DisplaysAppHostLogPathAndLogContainsEntries exercises a similar shape (redis + failing resource command + aspire stop) and has so far passed on the same runs — but it almost certainly shares the same underlying risk.
Speculative follow-up (separate from quarantine)
Consider whether ProcessShutdownService should extend its monitor window when the AppHost owns DCP-managed containers (or have StopCommand pass a longer per-call cancellation in container-heavy scenarios). Not in scope for this issue.
Artifacts
Failing test
Aspire.Cli.EndToEnd.Tests.ResourceCommandTests.ResourceCommand_FailsWhenInteractionServiceIsRequiredSource:
tests/Aspire.Cli.EndToEnd.Tests/ResourceCommandTests.csFailure signature
The test asserts the full happy-path of "trigger a resource command that requires
IInteractionService→ assert the expected error →aspire stop". The final step times out at the Hex1b automation level:The terminal recording shows that
aspire stopdoes not silently hang — it actively prints a failure after spinning for ~60s:The post-step
docker container lsin the same CI job shows the redis container the AppHost owns is still up at the moment of failure:So
ProcessShutdownService.StopProcessesAsyncexhausted its graceful + force-kill + monitor window without DCP finishing the container teardown, returnedfalse, andStopCommandemittedFailedToStopAppHost. The Hex1bWaitUntilText(" stopped successfully.")then times out as a downstream symptom.Why this is a flake, not a deterministic failure
Observed on PR #17452, run 26425521402:
PR #17452 only touches CLI init / channel / scaffolding code paths (
InitCommand,PackageChannel,GuestAppHostProject,ProjectUpdater,ScaffoldingService); none of those are on theaspire stop/ProcessShutdownService/ backchannel / DCP container shutdown path, so this is not a regression introduced by that PR — it surfaces a pre-existing intermittent shutdown-timing flake.Suspected root cause
This test is the only
ResourceCommandTestscase that combines:mountDockerSocket: true→ runs a realrediscontainer under DCPIInteractionServiceunavailable) immediately beforeaspire stopProcessShutdownService.StopProcessesAsync(seesrc/Aspire.Cli/Processes/ProcessShutdownService.cs) currently uses:s_processTerminationTimeout = 10sfor the post-graceful-shutdown monitorProcessSignaler.ForceKillon the AppHost process treeMonitorProcessesForTerminationAsyncpassUnder Docker-in-Docker contention on the GitHub
ubuntu-latestrunner, DCP's container teardown can outlast that budget, so the AppHost process does not exit and the CLI printsFailedToStopAppHost.The companion test
ResourceCommand_FailedExecution_DisplaysAppHostLogPathAndLogContainsEntriesexercises a similar shape (redis + failing resource command +aspire stop) and has so far passed on the same runs — but it almost certainly shares the same underlying risk.Speculative follow-up (separate from quarantine)
Consider whether
ProcessShutdownServiceshould extend its monitor window when the AppHost owns DCP-managed containers (or haveStopCommandpass a longer per-call cancellation in container-heavy scenarios). Not in scope for this issue.Artifacts
cli-e2e-recordings-Cli.EndToEnd-ResourceCommandTestsartifact:ResourceCommand_FailsWhenInteractionServiceIsRequired.cast