Skip to content

dazzlecmd-docker: canonical Docker image for tester agent isolation + reproducible regression runs #55

@djdarcy

Description

@djdarcy

Summary

Build a canonical Docker image for dazzlecmd that gives the tester agent a fully isolated execution environment, making the "always work on temp copies" principle structural rather than discipline-dependent. The image becomes the standard substrate for human-test-checklist execution and reproducible regression runs.

This is distinct from existing Docker-related work:

  • 4c-T1/4c-T2 in the closeout master plan: prove the docker runtime type works end-to-end (a tool dispatched into a container). That's about the runner. This issue is about test-environment isolation.
  • Test matrix / cross-environment testing substrate #42: cross-environment test matrix (testing dazzlecmd on multiple OSes / Python versions). Adjacent — dazzlecmd-docker is a building block of Test matrix / cross-environment testing substrate #42 but smaller in scope.
  • X-1/X-5 (closeout master plan): dazzlecmd-lib's own CI matrix. Adjacent — same containerization tooling, different consumer.

Motivation

Two recurring problems:

Problem 1 — Tester agent isolation is currently per-checklist discipline

Each test-checklist hand-writes DAZZLECMD_CONFIG=%TEMP%\... plumbing for cmd.exe / PowerShell / POSIX. The tester agent (per user direction) "should ALWAYS be trying to work on some temp copy not real data where possible." Today this is enforced by checklist-author discipline; an oversight in one step contaminates the developer's real ~/.dz/config.json.

A canonical container makes this structural: the container has no host filesystem access except explicitly-mounted directories, so there's no ~/.dz/ to leak into. The tester agent runs docker run --rm dazzlecmd:test pytest (or equivalent) and any state mutation dies with the container.

Problem 2 — Reproducible regression runs across machines

The current "v0.7.27/28 had no tester sweep" gap (4 versions without verification, surfaced 2026-04-29) was partly because tester runs are slow on a real dev box and pollute working state. A container makes a tester run cheap and repeatable: docker run dazzlecmd:test ./run-checklist v0.7.27 produces identical output anywhere.

This also unlocks parallel checklist execution: spin up 4 containers, run v0.7.25/26/27/28 simultaneously, collect SHIP/HOLD reports.

Proposed solution

Phase 1: Base image

Dockerfile at repo root or tools/docker/Dockerfile:

  • Base: python:3.11-slim (or matrix later)
  • Install dazzlecmd + dazzlecmd-lib in editable mode
  • Install pytest, the runners' optional deps (docker CLI, node, bun, pwsh per Test matrix / cross-environment testing substrate #42 expansion)
  • Mount points: /work for the project under test; /checklists for tests/checklists/
  • Default entrypoint: pytest or a wrapper script
  • No persistent volumes — container is ephemeral

Phase 2: Test runner script

tools/docker/run-checklist.sh (POSIX) + tools/docker/run-checklist.cmd (Windows):

  • Takes a checklist filename or version: ./run-checklist v0.7.27
  • Spins up the container with the project tree mounted read-only
  • Executes the automated portions of the checklist
  • Outputs structured PASS/FAIL/MANUAL report (same format as the existing tester agent emits)
  • Optional --shell to drop into the container for manual steps

Phase 3: Tester agent integration

The tester agent (tester subagent, per its definition) gains awareness of the container:

  • Detects tools/docker/Dockerfile exists
  • Defaults to running checklist commands inside the container
  • Falls back to host execution if Docker unavailable, with a warning
  • Reports any test that requires host-side action (interactive prompts, GUI) as MANUAL

Phase 4: CI integration

GitHub Actions workflow .github/workflows/checklist-runs.yml:

  • On every release-candidate tag, runs all checklists in containers
  • Posts SHIP/HOLD report as a comment on the corresponding GitHub release
  • Catches regressions before tag-and-release

Phase 5: Multi-distro matrix (#42 territory)

Multi-arch / multi-distro images (Alpine, Debian, Ubuntu, RHEL-family) — defers to #42's full scope but the base infrastructure is in place.

Acceptance criteria

  • Dockerfile builds a working dazzlecmd:test image with dazzlecmd installed + pytest available
  • Container runs the existing pytest suite green (937+ tests)
  • tools/docker/run-checklist.sh exists and runs a named checklist's automated portions inside the container
  • Container has zero persistent state; docker run --rm leaves no trace
  • Tester agent definition updated to prefer container execution when Dockerfile exists
  • Documentation: docs/guides/test-isolation-with-docker.md covering when/why to use container runs
  • CI workflow runs container-based checklist sweeps on release-candidate tags
  • All Phase 4e checklists (v0.7.25/26/27/28) verified through the new container substrate
  • Human test checklist for the docker test substrate itself: tests/checklists/v0.7.NN__Tool__dazzlecmd-docker.md

Decision points

  1. Image base: python:3.11-slim for size, or a fuller distro for runner coverage (python:3.11 non-slim, ubuntu:22.04)? Trade-off: slim is fast to build, but missing tooling means runner tests can't fully exercise.
  2. Where does it live: tools/docker/ in the dazzlecmd repo (where it's used), or a separate dazzlecmd-docker repo coordinated with dazzlecmd-lib repo extraction (subtree-split + own CI + PyPI namespace) #53 (lib extraction)? Recommendation: tools/docker/ for now; extract if it grows to cover full test matrix.
  3. Tester agent default: opt-in via flag, or default-on when Docker is available? Recommendation: default-on with --no-container escape hatch.
  4. Checklist format implication: do existing checklists need rewriting to be container-friendly? They use DAZZLECMD_CONFIG=%TEMP%\... patterns that work both ways. Recommendation: no rewrite; the container substitutes its own DAZZLECMD_CONFIG automatically.

Related issues

Analysis

  • Master closeout plan: 2026-04-29__07-41-11__claude-plan__0-7-x-closeout-ultraplan.md (this issue will be added as cross-cutting infrastructure, sibling to X-1 lib extraction)
  • Origin of concern: tester-agent-isolation gap surfaced 2026-04-29 (no tester sweep on v0.7.25/26/27/28 + per-checklist temp-copy plumbing is fragile)

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureStructural and design decisionsenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions