dazzlecmd-docker: canonical Docker image for tester agent isolation + reproducible regression runs

## Summary

Build a canonical Docker image for dazzlecmd that gives the tester agent a fully isolated execution environment, making the "always work on temp copies" principle structural rather than discipline-dependent. The image becomes the standard substrate for human-test-checklist execution and reproducible regression runs.

This is distinct from existing Docker-related work:
- **4c-T1/4c-T2** in the closeout master plan: prove the docker *runtime type* works end-to-end (a tool dispatched into a container). That's about the runner. This issue is about test-environment isolation.
- **#42**: cross-environment test matrix (testing dazzlecmd on multiple OSes / Python versions). Adjacent — `dazzlecmd-docker` is a building block of #42 but smaller in scope.
- **X-1/X-5** (closeout master plan): dazzlecmd-lib's own CI matrix. Adjacent — same containerization tooling, different consumer.

## Motivation

Two recurring problems:

### Problem 1 — Tester agent isolation is currently per-checklist discipline

Each test-checklist hand-writes `DAZZLECMD_CONFIG=%TEMP%\...` plumbing for cmd.exe / PowerShell / POSIX. The tester agent (per user direction) "should ALWAYS be trying to work on some temp copy not real data where possible." Today this is enforced by checklist-author discipline; an oversight in one step contaminates the developer's real `~/.dz/config.json`.

A canonical container makes this structural: the container has no host filesystem access except explicitly-mounted directories, so there's no `~/.dz/` to leak into. The tester agent runs `docker run --rm dazzlecmd:test pytest` (or equivalent) and any state mutation dies with the container.

### Problem 2 — Reproducible regression runs across machines

The current "v0.7.27/28 had no tester sweep" gap (4 versions without verification, surfaced 2026-04-29) was partly because tester runs are slow on a real dev box and pollute working state. A container makes a tester run cheap and repeatable: `docker run dazzlecmd:test ./run-checklist v0.7.27` produces identical output anywhere.

This also unlocks parallel checklist execution: spin up 4 containers, run v0.7.25/26/27/28 simultaneously, collect SHIP/HOLD reports.

## Proposed solution

### Phase 1: Base image

`Dockerfile` at repo root or `tools/docker/Dockerfile`:
- Base: `python:3.11-slim` (or matrix later)
- Install dazzlecmd + dazzlecmd-lib in editable mode
- Install pytest, the runners' optional deps (`docker` CLI, `node`, `bun`, `pwsh` per #42 expansion)
- Mount points: `/work` for the project under test; `/checklists` for `tests/checklists/`
- Default entrypoint: `pytest` or a wrapper script
- No persistent volumes — container is ephemeral

### Phase 2: Test runner script

`tools/docker/run-checklist.sh` (POSIX) + `tools/docker/run-checklist.cmd` (Windows):
- Takes a checklist filename or version: `./run-checklist v0.7.27`
- Spins up the container with the project tree mounted read-only
- Executes the automated portions of the checklist
- Outputs structured PASS/FAIL/MANUAL report (same format as the existing tester agent emits)
- Optional `--shell` to drop into the container for manual steps

### Phase 3: Tester agent integration

The tester agent (`tester` subagent, per its definition) gains awareness of the container:
- Detects `tools/docker/Dockerfile` exists
- Defaults to running checklist commands inside the container
- Falls back to host execution if Docker unavailable, with a warning
- Reports any test that requires host-side action (interactive prompts, GUI) as MANUAL

### Phase 4: CI integration

GitHub Actions workflow `.github/workflows/checklist-runs.yml`:
- On every release-candidate tag, runs all checklists in containers
- Posts SHIP/HOLD report as a comment on the corresponding GitHub release
- Catches regressions before tag-and-release

### Phase 5: Multi-distro matrix (#42 territory)

Multi-arch / multi-distro images (Alpine, Debian, Ubuntu, RHEL-family) — defers to #42's full scope but the base infrastructure is in place.

## Acceptance criteria

- [ ] `Dockerfile` builds a working `dazzlecmd:test` image with dazzlecmd installed + pytest available
- [ ] Container runs the existing pytest suite green (937+ tests)
- [ ] `tools/docker/run-checklist.sh` exists and runs a named checklist's automated portions inside the container
- [ ] Container has zero persistent state; `docker run --rm` leaves no trace
- [ ] Tester agent definition updated to prefer container execution when `Dockerfile` exists
- [ ] Documentation: `docs/guides/test-isolation-with-docker.md` covering when/why to use container runs
- [ ] CI workflow runs container-based checklist sweeps on release-candidate tags
- [ ] All Phase 4e checklists (v0.7.25/26/27/28) verified through the new container substrate
- [ ] Human test checklist for the docker test substrate itself: `tests/checklists/v0.7.NN__Tool__dazzlecmd-docker.md`

## Decision points

1. **Image base**: `python:3.11-slim` for size, or a fuller distro for runner coverage (`python:3.11` non-slim, `ubuntu:22.04`)? Trade-off: slim is fast to build, but missing tooling means runner tests can't fully exercise.
2. **Where does it live**: `tools/docker/` in the dazzlecmd repo (where it's used), or a separate `dazzlecmd-docker` repo coordinated with #53 (lib extraction)? Recommendation: `tools/docker/` for now; extract if it grows to cover full test matrix.
3. **Tester agent default**: opt-in via flag, or default-on when Docker is available? Recommendation: default-on with `--no-container` escape hatch.
4. **Checklist format implication**: do existing checklists need rewriting to be container-friendly? They use `DAZZLECMD_CONFIG=%TEMP%\...` patterns that work both ways. Recommendation: no rewrite; the container substitutes its own `DAZZLECMD_CONFIG` automatically.

## Related issues

- Refs #30 — Phase 4 epic (this is closeout infrastructure; goes under Phase 4 epic)
- Refs #42 — Test matrix / cross-environment testing substrate (this is a foundational piece; #42 is the broader matrix)
- Refs #53 — dazzlecmd-lib repo extraction (sibling work; lib gets its own containerization)
- Refs 4c-T1/4c-T2 in master closeout plan (related but distinct: those validate the docker *runner*, this builds a docker *test environment*)

## Analysis

- Master closeout plan: `2026-04-29__07-41-11__claude-plan__0-7-x-closeout-ultraplan.md` (this issue will be added as cross-cutting infrastructure, sibling to X-1 lib extraction)
- Origin of concern: tester-agent-isolation gap surfaced 2026-04-29 (no tester sweep on v0.7.25/26/27/28 + per-checklist temp-copy plumbing is fragile)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dazzlecmd-docker: canonical Docker image for tester agent isolation + reproducible regression runs #55

Summary

Motivation

Problem 1 — Tester agent isolation is currently per-checklist discipline

Problem 2 — Reproducible regression runs across machines

Proposed solution

Phase 1: Base image

Phase 2: Test runner script

Phase 3: Tester agent integration

Phase 4: CI integration

Phase 5: Multi-distro matrix (#42 territory)

Acceptance criteria

Decision points

Related issues

Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

dazzlecmd-docker: canonical Docker image for tester agent isolation + reproducible regression runs #55

Description

Summary

Motivation

Problem 1 — Tester agent isolation is currently per-checklist discipline

Problem 2 — Reproducible regression runs across machines

Proposed solution

Phase 1: Base image

Phase 2: Test runner script

Phase 3: Tester agent integration

Phase 4: CI integration

Phase 5: Multi-distro matrix (#42 territory)

Acceptance criteria

Decision points

Related issues

Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions