You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build a canonical Docker image for dazzlecmd that gives the tester agent a fully isolated execution environment, making the "always work on temp copies" principle structural rather than discipline-dependent. The image becomes the standard substrate for human-test-checklist execution and reproducible regression runs.
This is distinct from existing Docker-related work:
4c-T1/4c-T2 in the closeout master plan: prove the docker runtime type works end-to-end (a tool dispatched into a container). That's about the runner. This issue is about test-environment isolation.
X-1/X-5 (closeout master plan): dazzlecmd-lib's own CI matrix. Adjacent — same containerization tooling, different consumer.
Motivation
Two recurring problems:
Problem 1 — Tester agent isolation is currently per-checklist discipline
Each test-checklist hand-writes DAZZLECMD_CONFIG=%TEMP%\... plumbing for cmd.exe / PowerShell / POSIX. The tester agent (per user direction) "should ALWAYS be trying to work on some temp copy not real data where possible." Today this is enforced by checklist-author discipline; an oversight in one step contaminates the developer's real ~/.dz/config.json.
A canonical container makes this structural: the container has no host filesystem access except explicitly-mounted directories, so there's no ~/.dz/ to leak into. The tester agent runs docker run --rm dazzlecmd:test pytest (or equivalent) and any state mutation dies with the container.
Problem 2 — Reproducible regression runs across machines
The current "v0.7.27/28 had no tester sweep" gap (4 versions without verification, surfaced 2026-04-29) was partly because tester runs are slow on a real dev box and pollute working state. A container makes a tester run cheap and repeatable: docker run dazzlecmd:test ./run-checklist v0.7.27 produces identical output anywhere.
This also unlocks parallel checklist execution: spin up 4 containers, run v0.7.25/26/27/28 simultaneously, collect SHIP/HOLD reports.
Proposed solution
Phase 1: Base image
Dockerfile at repo root or tools/docker/Dockerfile:
Base: python:3.11-slim (or matrix later)
Install dazzlecmd + dazzlecmd-lib in editable mode
Multi-arch / multi-distro images (Alpine, Debian, Ubuntu, RHEL-family) — defers to #42's full scope but the base infrastructure is in place.
Acceptance criteria
Dockerfile builds a working dazzlecmd:test image with dazzlecmd installed + pytest available
Container runs the existing pytest suite green (937+ tests)
tools/docker/run-checklist.sh exists and runs a named checklist's automated portions inside the container
Container has zero persistent state; docker run --rm leaves no trace
Tester agent definition updated to prefer container execution when Dockerfile exists
Documentation: docs/guides/test-isolation-with-docker.md covering when/why to use container runs
CI workflow runs container-based checklist sweeps on release-candidate tags
All Phase 4e checklists (v0.7.25/26/27/28) verified through the new container substrate
Human test checklist for the docker test substrate itself: tests/checklists/v0.7.NN__Tool__dazzlecmd-docker.md
Decision points
Image base: python:3.11-slim for size, or a fuller distro for runner coverage (python:3.11 non-slim, ubuntu:22.04)? Trade-off: slim is fast to build, but missing tooling means runner tests can't fully exercise.
Where does it live: tools/docker/ in the dazzlecmd repo (where it's used), or a separate dazzlecmd-docker repo coordinated with dazzlecmd-lib repo extraction (subtree-split + own CI + PyPI namespace) #53 (lib extraction)? Recommendation: tools/docker/ for now; extract if it grows to cover full test matrix.
Tester agent default: opt-in via flag, or default-on when Docker is available? Recommendation: default-on with --no-container escape hatch.
Checklist format implication: do existing checklists need rewriting to be container-friendly? They use DAZZLECMD_CONFIG=%TEMP%\... patterns that work both ways. Recommendation: no rewrite; the container substitutes its own DAZZLECMD_CONFIG automatically.
Refs 4c-T1/4c-T2 in master closeout plan (related but distinct: those validate the docker runner, this builds a docker test environment)
Analysis
Master closeout plan: 2026-04-29__07-41-11__claude-plan__0-7-x-closeout-ultraplan.md (this issue will be added as cross-cutting infrastructure, sibling to X-1 lib extraction)
Origin of concern: tester-agent-isolation gap surfaced 2026-04-29 (no tester sweep on v0.7.25/26/27/28 + per-checklist temp-copy plumbing is fragile)
Summary
Build a canonical Docker image for dazzlecmd that gives the tester agent a fully isolated execution environment, making the "always work on temp copies" principle structural rather than discipline-dependent. The image becomes the standard substrate for human-test-checklist execution and reproducible regression runs.
This is distinct from existing Docker-related work:
dazzlecmd-dockeris a building block of Test matrix / cross-environment testing substrate #42 but smaller in scope.Motivation
Two recurring problems:
Problem 1 — Tester agent isolation is currently per-checklist discipline
Each test-checklist hand-writes
DAZZLECMD_CONFIG=%TEMP%\...plumbing for cmd.exe / PowerShell / POSIX. The tester agent (per user direction) "should ALWAYS be trying to work on some temp copy not real data where possible." Today this is enforced by checklist-author discipline; an oversight in one step contaminates the developer's real~/.dz/config.json.A canonical container makes this structural: the container has no host filesystem access except explicitly-mounted directories, so there's no
~/.dz/to leak into. The tester agent runsdocker run --rm dazzlecmd:test pytest(or equivalent) and any state mutation dies with the container.Problem 2 — Reproducible regression runs across machines
The current "v0.7.27/28 had no tester sweep" gap (4 versions without verification, surfaced 2026-04-29) was partly because tester runs are slow on a real dev box and pollute working state. A container makes a tester run cheap and repeatable:
docker run dazzlecmd:test ./run-checklist v0.7.27produces identical output anywhere.This also unlocks parallel checklist execution: spin up 4 containers, run v0.7.25/26/27/28 simultaneously, collect SHIP/HOLD reports.
Proposed solution
Phase 1: Base image
Dockerfileat repo root ortools/docker/Dockerfile:python:3.11-slim(or matrix later)dockerCLI,node,bun,pwshper Test matrix / cross-environment testing substrate #42 expansion)/workfor the project under test;/checklistsfortests/checklists/pytestor a wrapper scriptPhase 2: Test runner script
tools/docker/run-checklist.sh(POSIX) +tools/docker/run-checklist.cmd(Windows):./run-checklist v0.7.27--shellto drop into the container for manual stepsPhase 3: Tester agent integration
The tester agent (
testersubagent, per its definition) gains awareness of the container:tools/docker/DockerfileexistsPhase 4: CI integration
GitHub Actions workflow
.github/workflows/checklist-runs.yml:Phase 5: Multi-distro matrix (#42 territory)
Multi-arch / multi-distro images (Alpine, Debian, Ubuntu, RHEL-family) — defers to #42's full scope but the base infrastructure is in place.
Acceptance criteria
Dockerfilebuilds a workingdazzlecmd:testimage with dazzlecmd installed + pytest availabletools/docker/run-checklist.shexists and runs a named checklist's automated portions inside the containerdocker run --rmleaves no traceDockerfileexistsdocs/guides/test-isolation-with-docker.mdcovering when/why to use container runstests/checklists/v0.7.NN__Tool__dazzlecmd-docker.mdDecision points
python:3.11-slimfor size, or a fuller distro for runner coverage (python:3.11non-slim,ubuntu:22.04)? Trade-off: slim is fast to build, but missing tooling means runner tests can't fully exercise.tools/docker/in the dazzlecmd repo (where it's used), or a separatedazzlecmd-dockerrepo coordinated with dazzlecmd-lib repo extraction (subtree-split + own CI + PyPI namespace) #53 (lib extraction)? Recommendation:tools/docker/for now; extract if it grows to cover full test matrix.--no-containerescape hatch.DAZZLECMD_CONFIG=%TEMP%\...patterns that work both ways. Recommendation: no rewrite; the container substitutes its ownDAZZLECMD_CONFIGautomatically.Related issues
Analysis
2026-04-29__07-41-11__claude-plan__0-7-x-closeout-ultraplan.md(this issue will be added as cross-cutting infrastructure, sibling to X-1 lib extraction)