Flaky: ResourceCommand_FailsWhenInteractionServiceIsRequired times out on aspire stop with DCP container still running

## Failing test

`Aspire.Cli.EndToEnd.Tests.ResourceCommandTests.ResourceCommand_FailsWhenInteractionServiceIsRequired`

Source: `tests/Aspire.Cli.EndToEnd.Tests/ResourceCommandTests.cs`

## Failure signature

The test asserts the full happy-path of "trigger a resource command that requires `IInteractionService` → assert the expected error → `aspire stop`". The final step times out at the Hex1b automation level:

```
Step 72 of 72 failed — WaitUntilText(" stopped successfully.")
  Timed out after 00:01:00 waiting for: text " stopped successfully." to appear
  at CliE2EAutomatorHelpers.cs:84
```

The terminal recording shows that `aspire stop` does not silently hang — it actively prints a failure after spinning for ~60s:

```
⠳ Stopping apphost.cs...
❌ Failed to stop apphost.cs.
📄 See logs at /root/.aspire/logs/cli_<id>.log
🔍 See AppHost logs at /root/.aspire/logs/cli_<id>_detach-child_<id>.log
```

The post-step `docker container ls` in the same CI job shows the redis container the AppHost owns is still up at the moment of failure:

```
CONTAINER ID   IMAGE          STATUS         NAMES
5aa9f3938387   redis:latest   Up 7 seconds   cache-xuhbbbzq
```

So `ProcessShutdownService.StopProcessesAsync` exhausted its graceful + force-kill + monitor window without DCP finishing the container teardown, returned `false`, and `StopCommand` emitted `FailedToStopAppHost`. The Hex1b `WaitUntilText(" stopped successfully.")` then times out as a downstream symptom.

## Why this is a flake, not a deterministic failure

Observed on PR #17452, run [26425521402](https://github.com/microsoft/aspire/actions/runs/26425521402):

| Attempt | ResourceCommandTests |
| --- | --- |
| 1 | ❌ fails (same shape as above) |
| 2 | ✅ passes |
| 3 | ❌ fails (same shape) |

PR #17452 only touches CLI init / channel / scaffolding code paths (`InitCommand`, `PackageChannel`, `GuestAppHostProject`, `ProjectUpdater`, `ScaffoldingService`); none of those are on the `aspire stop` / `ProcessShutdownService` / backchannel / DCP container shutdown path, so this is not a regression introduced by that PR — it surfaces a pre-existing intermittent shutdown-timing flake.

## Suspected root cause

This test is the only `ResourceCommandTests` case that combines:

- `mountDockerSocket: true` → runs a real `redis` container under DCP
- a deliberately-failing resource command (`IInteractionService` unavailable) immediately before `aspire stop`

`ProcessShutdownService.StopProcessesAsync` (see `src/Aspire.Cli/Processes/ProcessShutdownService.cs`) currently uses:

- `s_processTerminationTimeout = 10s` for the post-graceful-shutdown monitor
- followed by `ProcessSignaler.ForceKill` on the AppHost process tree
- followed by another `MonitorProcessesForTerminationAsync` pass

Under Docker-in-Docker contention on the GitHub `ubuntu-latest` runner, DCP's container teardown can outlast that budget, so the AppHost process does not exit and the CLI prints `FailedToStopAppHost`.

The companion test `ResourceCommand_FailedExecution_DisplaysAppHostLogPathAndLogContainsEntries` exercises a similar shape (redis + failing resource command + `aspire stop`) and has so far passed on the same runs — but it almost certainly shares the same underlying risk.

## Speculative follow-up (separate from quarantine)

Consider whether `ProcessShutdownService` should extend its monitor window when the AppHost owns DCP-managed containers (or have `StopCommand` pass a longer per-call cancellation in container-heavy scenarios). Not in scope for this issue.

## Artifacts

- Failing CI run: https://github.com/microsoft/aspire/actions/runs/26425521402
- PR where this was observed: https://github.com/microsoft/aspire/pull/17452
- Asciinema recording in the `cli-e2e-recordings-Cli.EndToEnd-ResourceCommandTests` artifact: `ResourceCommand_FailsWhenInteractionServiceIsRequired.cast`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky: ResourceCommand_FailsWhenInteractionServiceIsRequired times out on aspire stop with DCP container still running #17485

Failing test

Failure signature

Why this is a flake, not a deterministic failure

Suspected root cause

Speculative follow-up (separate from quarantine)

Artifacts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Attempt	ResourceCommandTests
1	❌ fails (same shape as above)
2	✅ passes
3	❌ fails (same shape)

Flaky: ResourceCommand_FailsWhenInteractionServiceIsRequired times out on aspire stop with DCP container still running #17485

Description

Failing test

Failure signature

Why this is a flake, not a deterministic failure

Suspected root cause

Speculative follow-up (separate from quarantine)

Artifacts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions