Add Docker site environments and integration tests#12
Conversation
- Add Dockerfiles and configuration for 6 sites: gitlab, map, reddit, shopping, shopping_admin, wikipedia - Add docker-compose.yml for orchestrating all services - Add integration tests with Playwright for each site - Add dev utilities for logging, git, network, and path operations - Add environment settings and tasks for building/managing containers - Move contributing code to dev directory Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Split dev tasks into category files: code_tasks, data_tasks, docs_tasks, env_tasks - Move docker_build to top-level task in tasks.py - Move monitoring config to assets/environments/monitoring/ - Remove template-dependent tasks and dev/templates/ - Add Docker sites CI workflow Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Fix site names from hyphens to underscores (shopping_admin not shopping-admin) - Update Available Sites table with all 6 sites including Map - Add Env-Ctrl ports column to tables - Fix image names to current convention (am1n3e/webarena-verified-<site>) - Update directory structure from contributing/ to dev/environments/ - Add Docker Compose quick start instructions - Add Data Management commands (data-download, setup) - Update Base Image Pipeline with correct script names - Add Environment Variables reference for Docker Compose Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds Docker-based infrastructure to build/run WebArena site environments locally (via Docker Compose and per-site images), plus Playwright/HTTP integration tests and Invoke tooling/CI to exercise those environments.
Changes:
- Introduces Docker Compose orchestration (including Gatus monitoring) and per-site Docker build assets for multiple WebArena environments.
- Adds Playwright + basic HTTP + env-ctrl integration tests for the Dockerized sites.
- Expands Invoke task surface area and dev utilities to manage images/containers and improve CLI output.
Reviewed changes
Copilot reviewed 132 out of 136 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Adds Docker-related Python dependencies to the lockfile. |
| tests/integration/environments/wikipedia/test_playwright.py | Adds Playwright UI tests for Wikipedia container. |
| tests/integration/environments/wikipedia/test_basic.py | Adds basic HTTP smoke test for Wikipedia. |
| tests/integration/environments/wikipedia/conftest.py | Adds Wikipedia-specific pytest fixtures/CLI options usage. |
| tests/integration/environments/wikipedia/init.py | Declares Wikipedia integration test package. |
| tests/integration/environments/test_env_ctrl.py | Adds shared env-ctrl API/exec tests across sites. |
| tests/integration/environments/shopping_admin/test_playwright.py | Adds Playwright UI tests for Magento admin. |
| tests/integration/environments/shopping_admin/test_basic.py | Adds basic HTTP tests for Magento admin. |
| tests/integration/environments/shopping_admin/conftest.py | Adds Shopping Admin fixtures (credentials/autologin helpers). |
| tests/integration/environments/shopping_admin/init.py | Declares Shopping Admin integration test package. |
| tests/integration/environments/shopping/test_basic.py | Adds basic HTTP tests for Magento storefront. |
| tests/integration/environments/shopping/conftest.py | Adds Shopping fixtures (credentials/autologin helpers). |
| tests/integration/environments/shopping/init.py | Declares Shopping integration test package. |
| tests/integration/environments/reddit/test_basic.py | Adds basic HTTP tests for Reddit/Postmill. |
| tests/integration/environments/reddit/conftest.py | Adds Reddit fixtures (credentials/autologin helpers). |
| tests/integration/environments/reddit/init.py | Declares Reddit integration test package. |
| tests/integration/environments/map/test_playwright.py | Adds Playwright UI tests for the map site. |
| tests/integration/environments/map/test_basic.py | Adds basic HTTP tests for map site + tiles. |
| tests/integration/environments/map/conftest.py | Adds map-specific pytest fixtures. |
| tests/integration/environments/map/init.py | Declares map integration test package. |
| tests/integration/environments/gitlab/test_playwright.py | Adds Playwright UI tests for GitLab. |
| tests/integration/environments/gitlab/test_basic.py | Adds basic HTTP tests for GitLab. |
| tests/integration/environments/gitlab/conftest.py | Adds GitLab fixtures and a logged-in page helper. |
| tests/integration/environments/gitlab/init.py | Declares GitLab integration test package. |
| tests/integration/environments/conftest.py | Adds global integration-test CLI options and env-ctrl client factories. |
| tests/integration/environments/init.py | Declares integration environments test package. |
| tasks.py | Reworks Invoke entrypoint and adds Compose/image build tasks. |
| pyproject.toml | Adds dev deps + pytest markers for Docker integration tests. |
| docker-compose.yml | Adds multi-service Compose file for all site environments + monitor. |
| dev/utils/path_utils.py | Adds helper for locating repo root. |
| dev/utils/network_utils.py | Adds helper for finding a free port. |
| dev/utils/logging_utils/step_context.py | Adds Rich-backed step context manager for CLI output. |
| dev/utils/logging_utils/console.py | Adds Rich console setup and import guard. |
| dev/utils/logging_utils/init.py | Exposes logging utilities for dev tooling. |
| dev/utils/git_utils.py | Adds helper for resolving git short SHA. |
| dev/environments/tasks.py | Adds envs.* Invoke namespace and site listing task. |
| dev/environments/settings.py | Adds centralized per-site Docker/image/data settings. |
| dev/environments/docker/utils/sites.py | Adds site registry helpers for Docker tasks. |
| dev/environments/docker/utils/downloads.py | Adds Docker/data download helpers for images and artifacts. |
| dev/environments/docker/utils/dockerfile.py | Adds Dockerfile parsing for container port discovery. |
| dev/environments/docker/utils/docker_setup_helpers.py | Adds shared Docker volume setup helpers. |
| dev/environments/docker/utils/containers.py | Adds container lifecycle helpers and env-ctrl wiring. |
| dev/environments/docker/utils/init.py | Adds shared Docker utils exports/constants. |
| dev/environments/docker/sites/wikipedia/supervisord.conf | Adds supervisord config to run kiwix + env-ctrl. |
| dev/environments/docker/sites/wikipedia/scripts/setup_volumes.sh | Adds volume setup script for Wikipedia ZIM. |
| dev/environments/docker/sites/wikipedia/scripts/docker_setup.py | Adds Python wrapper to set up Wikipedia volumes. |
| dev/environments/docker/sites/wikipedia/index.html | Adds redirect landing page for Wikipedia container. |
| dev/environments/docker/sites/wikipedia/entrypoint.sh | Adds Wikipedia entrypoint to configure ZIM and start supervisor. |
| dev/environments/docker/sites/wikipedia/Dockerfile | Adds Dockerfile for Wikipedia site image. |
| dev/environments/docker/sites/shopping_admin/scripts/90_verify.sh | Adds post-cleanup DB verification script for shopping_admin base image. |
| dev/environments/docker/sites/shopping_admin/scripts/60_cleanup.sh | Adds cleanup script to reduce image size for shopping_admin base image. |
| dev/environments/docker/sites/shopping_admin/scripts/30_optimize.sh | Adds image optimization script for shopping_admin assets. |
| dev/environments/docker/sites/shopping_admin/scripts/10_post_patch.sh | Adds post-patch script to start/configure/compile Magento admin. |
| dev/environments/docker/sites/shopping_admin/scripts/00_apply_patches.sh | Adds patch/bootstrap script for shopping_admin. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/registration.php | Adds Magento module registration for admin autologin. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/module.xml | Adds Magento module definition for admin autologin. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/etc/supervisor.d/env-ctrl.ini | Adds supervisor program config for env-ctrl server. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/etc/supervisor.d/env-ctrl-init.ini | Adds supervisor program config for env-ctrl init. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/entrypoint.sh | Adds entrypoint wrapper for runtime tuning (nginx/mysql/es). |
| dev/environments/docker/sites/shopping_admin/docker_overrides/di.xml | Adds Magento DI plugins wiring for autologin + mass action restrictions. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/README.md | Documents shopping_admin Docker overrides/autologin behavior. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/DisableReviewMassActionsPlugin.php | Adds Magento plugin to disable review mass actions. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/DisableProductMassActionsPlugin.php | Adds Magento plugin to disable product mass actions. |
| dev/environments/docker/sites/shopping_admin/docker_overrides/AutoLoginPlugin.php | Adds Magento plugin to support header-based admin login. |
| dev/environments/docker/sites/shopping_admin/Dockerfile | Adds final shopping_admin image Dockerfile (env-ctrl + entrypoint). |
| dev/environments/docker/sites/shopping/scripts/90_verify.sh | Adds post-cleanup DB verification script for shopping base image. |
| dev/environments/docker/sites/shopping/scripts/60_cleanup.sh | Adds cleanup script to reduce image size for shopping base image. |
| dev/environments/docker/sites/shopping/scripts/20_optimize.sh | Adds image optimization script for shopping assets. |
| dev/environments/docker/sites/shopping/scripts/00_apply_patches.sh | Adds patch/bootstrap script for shopping storefront. |
| dev/environments/docker/sites/shopping/docker_overrides/etc/supervisor.d/env-ctrl.ini | Adds supervisor program config for env-ctrl server (shopping). |
| dev/environments/docker/sites/shopping/docker_overrides/etc/supervisor.d/env-ctrl-init.ini | Adds supervisor program config for env-ctrl init (shopping). |
| dev/environments/docker/sites/shopping/docker_overrides/entrypoint.sh | Adds entrypoint wrapper for runtime tuning (shopping). |
| dev/environments/docker/sites/shopping/docker_overrides/CustomerAutoLogin/registration.php | Adds Magento module registration for customer autologin. |
| dev/environments/docker/sites/shopping/docker_overrides/CustomerAutoLogin/etc/module.xml | Adds Magento module definition for customer autologin. |
| dev/environments/docker/sites/shopping/docker_overrides/CustomerAutoLogin/etc/di.xml | Adds Magento DI wiring for customer autologin plugin. |
| dev/environments/docker/sites/shopping/docker_overrides/CustomerAutoLogin/README.md | Documents customer autologin module behavior. |
| dev/environments/docker/sites/shopping/docker_overrides/CustomerAutoLogin/Plugin/CustomerAutoLoginPlugin.php | Adds Magento plugin to support header-based customer login. |
| dev/environments/docker/sites/shopping/Dockerfile | Adds final shopping image Dockerfile (env-ctrl + entrypoint). |
| dev/environments/docker/sites/reddit/scripts/90_verify.sh | Adds DB verification script for reddit base image. |
| dev/environments/docker/sites/reddit/scripts/60_cleanup.sh | Adds cleanup script for reddit base image. |
| dev/environments/docker/sites/reddit/scripts/20_optimize.sh | Adds aggressive image optimization for reddit base image. |
| dev/environments/docker/sites/reddit/scripts/00_apply_patches.sh | Adds patch/bootstrap script for reddit. |
| dev/environments/docker/sites/reddit/docker_overrides/security.yaml | Adds Symfony security overrides for header-based auth. |
| dev/environments/docker/sites/reddit/docker_overrides/http_client.yaml | Adds Symfony HTTP client overrides for URL rewriting. |
| dev/environments/docker/sites/reddit/docker_overrides/etc/supervisor.d/env-ctrl.ini | Adds supervisor env-ctrl config for reddit. |
| dev/environments/docker/sites/reddit/docker_overrides/etc/supervisor.d/env-ctrl-init.ini | Adds supervisor env-ctrl init config for reddit. |
| dev/environments/docker/sites/reddit/docker_overrides/VoteManager.php | Patches reddit vote handling to preserve imported net score. |
| dev/environments/docker/sites/reddit/docker_overrides/Votable.php | Adds/adjusts vote contract for delta-based scoring. |
| dev/environments/docker/sites/reddit/docker_overrides/UrlRewritingHttpClient.php | Adds URL rewriting HTTP client decorator. |
| dev/environments/docker/sites/reddit/docker_overrides/README.md | Documents reddit override rationale and usage. |
| dev/environments/docker/sites/reddit/docker_overrides/HeaderAutologinAuthenticator.php | Adds header-based authenticator for reddit. |
| dev/environments/docker/sites/reddit/Dockerfile | Adds final reddit image Dockerfile (env-ctrl). |
| dev/environments/docker/sites/map/scripts/warmup_tiles.py | Adds tile cache warmup script. |
| dev/environments/docker/sites/map/scripts/setup_volumes.sh | Adds map volume download/extract script. |
| dev/environments/docker/sites/map/scripts/import-osm-data.sh | Adds helper script to import OSM data into DB. |
| dev/environments/docker/sites/map/scripts/docker_setup.py | Adds Python wrapper to set up map volumes. |
| dev/environments/docker/sites/map/docker_overrides/tile-server.conf | Adds Apache vhost proxy config for tiles/OSRM/Nominatim/Rails. |
| dev/environments/docker/sites/map/docker_overrides/supervisord.conf | Adds supervisor config to run map stack + env-ctrl. |
| dev/environments/docker/sites/map/docker_overrides/settings.local.yml | Adds OSM site settings overrides for internal services. |
| dev/environments/docker/sites/map/docker_overrides/nominatim.conf | Adds Apache vhost config for Nominatim. |
| dev/environments/docker/sites/map/docker_overrides/database.yml | Adds Rails DB config override. |
| dev/environments/docker/sites/gitlab/scripts/90_verify.sh | Adds GitLab DB verification script. |
| dev/environments/docker/sites/gitlab/scripts/70_configure.sh | Adds build-time gitlab reconfigure script. |
| dev/environments/docker/sites/gitlab/scripts/60_cleanup.sh | Adds cleanup script for GitLab base image. |
| dev/environments/docker/sites/gitlab/scripts/20_optimize.sh | Adds no-op optimize script for GitLab. |
| dev/environments/docker/sites/gitlab/scripts/00_apply_patches.sh | Adds env-ctrl bootstrap script for GitLab base image. |
| dev/environments/docker/sites/gitlab/docker_overrides/gitlab.small.rb | Adds small-footprint GitLab config. |
| dev/environments/docker/sites/gitlab/docker_overrides/gitlab.rb | Adds default optimized GitLab config. |
| dev/environments/docker/sites/gitlab/docker_overrides/gitlab.large.rb | Adds larger-footprint GitLab config. |
| dev/environments/docker/sites/gitlab/docker_overrides/entrypoint.sh | Adds custom GitLab entrypoint (runsvdir + optional reconfigure + env-ctrl). |
| dev/environments/docker/sites/gitlab/Dockerfile | Adds final GitLab image Dockerfile (env-ctrl + entrypoint). |
| dev/environments/docker/init.py | Exposes docker tasks namespace. |
| dev/environments/README.md | Adds documentation for Docker image management workflow. |
| dev/env_tasks.py | Adds Invoke task for initializing dev environment. |
| dev/docs_tasks.py | Adds Invoke tasks for building/serving/deploying docs. |
| dev/data_tasks.py | Adds Invoke tasks for dataset formatting. |
| dev/code_tasks.py | Adds Invoke tasks for lint/format/type-check. |
| dev/init.py | Removes previous dev package docstring placeholder. |
| assets/environments/monitoring/config.compose.yaml | Adds Gatus config for Compose monitoring dashboard. |
| .github/workflows/test.yml | Excludes new environment integration tests from default test job. |
| .github/workflows/test-docker-sites.yml | Adds CI workflow intended to run Docker environment integration tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Service config: display name, port env var, default port | ||
| SERVICES = { | ||
| "shopping_admin": ("Shopping Admin", "WA_SHOPPING_ADMIN_PORT", 7780), | ||
| "shopping": ("Shopping", "WA_SHOPPING_PORT", 7770), | ||
| "gitlab": ("GitLab", "WA_GITLAB_PORT", 8023), | ||
| "reddit": ("Reddit", "WA_REDDIT_PORT", 9999), | ||
| "wikipedia": ("Wikipedia", "WA_WIKIPEDIA_PORT", 8888), | ||
| "monitor": ("Gatus Dashboard", "WA_MONITOR_PORT", 8870), | ||
| } |
There was a problem hiding this comment.
SERVICES/SERVICE_DESCRIPTIONS omit the map service, but docker-compose.yml defines a map service and the monitor container expects WA_MAP_* env vars. This causes inv compose.up --service map to not set the monitor display name, and inv compose.up output won’t include the map URL. Add map to SERVICES/SERVICE_DESCRIPTIONS (and update _get_service_url) or remove map from compose/monitor to keep behavior consistent.
| print( | ||
| "ERROR: Rich library is required for CLI output formatting.\n" | ||
| "Rich is a dev dependency. Install it with:\n\n" | ||
| " uv sync --group dev\n\n" | ||
| "Or install rich directly:\n\n" | ||
| " uv pip install rich\n", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
Print statement may execute during import.
Renamed tasks to avoid namespace-prefixed names: - dev.docs.docs-serve → dev.docs.serve - dev.docs.docs-build → dev.docs.build - dev.docs.docs-deploy → dev.docs.deploy - dev.code.code-format-and-check → dev.code.format - dev.data.data-format → dev.data.format - dev.env.env-init → dev.env.init - demo.demo-gitlab-start → demo.gitlab-start - demo.demo-gitlab-stop → demo.gitlab-stop Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Create dedicated doc pages for each environment (shopping_admin, shopping, reddit, gitlab, wikipedia, map) - Move shared Docker info to index.md (size improvements, env vars, commands) - Add announcement about Docker images availability to README - Update map.md to explain single-container optimization vs original 5 containers - Remove docker_images.md (content redistributed) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add dev/ci_tasks.py for CI-related invoke tasks - Add Dockerfile.ci for Wikipedia environment - Move site README files from docker_overrides/ to sites/ level - Update GitHub workflow and gitignore - Update tasks.py imports Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add section showing how to run WebArena environments with docker run - Remove map NOTES.md - Disable test-docker-sites.yml workflow temporarily Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 144 out of 149 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logging_utils.print_info(f"Running tests with marker: {marker}") | ||
| headed_flag = " --headed" if headed else "" | ||
| site_url = f"http://localhost:{port}" | ||
| env_ctrl_url = f"http://localhost:{env_ctrl_port}" | ||
| pytest_args = f"--{site}_url={site_url} --{site}_env_ctrl_url={env_ctrl_url}" | ||
| ctx.run(f"uv run pytest tests/integration/environments/ -m {marker} -v {pytest_args}{headed_flag}") |
There was a problem hiding this comment.
docker_test builds pytest CLI args assuming every site supports both --{site}_url and --{site}_env_ctrl_url, but Map tests require --map_url and --map_tile_url (and tests/integration/environments/conftest.py does not define --map_env_ctrl_url). As written, inv envs.docker.test --site=map will error on unknown CLI args / missing required args, and the CI matrix includes map so this is likely to fail the workflow. Consider making pytest args site-specific (e.g., pass --map_url and --map_tile_url for map, and only pass --{site}_env_ctrl_url for sites that support it) or add the missing CLI option/fixtures for map env-ctrl tests.
| # Map (OpenStreetMap) site | ||
| parser.addoption( | ||
| "--map_url", | ||
| action="store", | ||
| default=None, |
There was a problem hiding this comment.
The integration test CLI options define --map_url and --map_tile_url, but there is no --map_env_ctrl_url option even though other sites use --{site}_env_ctrl_url and envs.docker.test currently passes --{site}_env_ctrl_url for every site. Either add --map_env_ctrl_url here (and corresponding fixtures/tests), or adjust the docker test runner to not pass env-ctrl args for map.
| SITES = [ | ||
| pytest.param("shopping", marks=pytest.mark.integration_docker_shopping), | ||
| pytest.param("shopping_admin", marks=pytest.mark.integration_docker_shopping_admin), | ||
| pytest.param("reddit", marks=pytest.mark.integration_docker_reddit), | ||
| pytest.param("gitlab", marks=pytest.mark.integration_docker_gitlab), | ||
| pytest.param("wikipedia", marks=pytest.mark.integration_docker_wikipedia), |
There was a problem hiding this comment.
SITES includes shopping/shopping_admin/reddit/gitlab/wikipedia but omits map, even though the repo defines an integration_docker_map marker and the Map container exposes env-ctrl. If map is expected to support env-ctrl (it appears to from the Docker Compose + supervisord config), add it to this parametrization so env-ctrl coverage is consistent across sites.
| - name: "${WA_REDDIT_NAME}" | ||
| link: "http://localhost:${WA_REDDIT_PORT}/" | ||
| - name: "${WA_WIKIPEDIA_NAME}" | ||
| link: "http://localhost:${WA_WIKIPEDIA_PORT}/" |
There was a problem hiding this comment.
The Gatus UI buttons list does not include Map, even though docker-compose.yml passes WA_MAP_PORT/WA_MAP_NAME. Add a Map button entry so the monitor dashboard links to the Map service as intended.
| link: "http://localhost:${WA_WIKIPEDIA_PORT}/" | |
| link: "http://localhost:${WA_WIKIPEDIA_PORT}/" | |
| - name: "${WA_MAP_NAME}" | |
| link: "http://localhost:${WA_MAP_PORT}/" |
| - name: "${WA_WIKIPEDIA_NAME}" | ||
| url: "http://wikipedia:8080/" | ||
| interval: 60s | ||
| conditions: | ||
| - "[STATUS] == 200" |
There was a problem hiding this comment.
The Gatus endpoints list does not include a health check for Map, even though Compose provides WA_MAP_* env vars and runs a map service. Consider adding a Map endpoint (e.g., http://map:8080/) with an appropriate status condition so the dashboard reflects Map availability too.
| @@ -0,0 +1,28 @@ | |||
| # Shopping Admin (Magento) | |||
|
|
|||
| Port: 6680 | |||
There was a problem hiding this comment.
This README says Port: 6680, but the Docker Compose file and docs for Shopping Admin use port 7780. Please update this to avoid confusing users about the default port mapping.
| | Site | Port | Fixes | | ||
| |------|------|-------| | ||
| | [shopping_admin](sites/shopping_admin/) | 6680 | Header auth, mass action protection | | ||
| | [shopping](sites/shopping/) | 7770 | Header auth for customers | | ||
| | [reddit](sites/reddit/) | 9999 | Vote system, header auth, URL rewriting, rate limits | |
There was a problem hiding this comment.
The site table lists Shopping Admin as port 6680, but elsewhere in this PR (docker-compose.yml, docs/environments/shopping_admin.md, tasks.py) Shopping Admin is 7780. Align this table with the actual default port to avoid misleading users.
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
| s.bind(("", 0)) | ||
| s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
| return s.getsockname()[1] |
There was a problem hiding this comment.
find_free_port() sets SO_REUSEADDR after calling bind(), which is ineffective on most platforms (the option must be set before binding to influence the bind). Consider moving setsockopt() before bind() or dropping it entirely since this function is only probing for an ephemeral port.
| def _get_service_url(service: str) -> str: | ||
| """Get the localhost URL for a service.""" | ||
| if service in SERVICES: | ||
| _, _, default_port = SERVICES[service] | ||
| return f"http://localhost:{default_port}" | ||
| return "" |
There was a problem hiding this comment.
_get_service_url() always prints URLs using the hard-coded default ports from SERVICES, so the output becomes wrong when users override ports via environment variables (e.g., WA_SHOPPING_PORT). Consider reading the env var (the second tuple element in SERVICES) and falling back to default_port only when it is unset.
| cleanup() { | ||
| echo "Shutting down..." | ||
| gitlab-ctl stop 2>/dev/null || true | ||
| kill $RUNSVDIR_PID 2>/dev/null || true |
There was a problem hiding this comment.
With set -u, the cleanup() trap can fail if a signal arrives before RUNSVDIR_PID is assigned (unbound variable), causing the entrypoint to crash while handling SIGTERM/SIGINT. Consider initializing RUNSVDIR_PID (and other PIDs) to an empty value and guarding kill calls (e.g., only kill when the variable is set and the process is running).
| cleanup() { | |
| echo "Shutting down..." | |
| gitlab-ctl stop 2>/dev/null || true | |
| kill $RUNSVDIR_PID 2>/dev/null || true | |
| # Initialize PIDs to avoid unbound-variable errors in traps when using `set -u` | |
| RUNSVDIR_PID="" | |
| ENV_CTRL_PID="" | |
| cleanup() { | |
| echo "Shutting down..." | |
| gitlab-ctl stop 2>/dev/null || true | |
| if [ -n "${RUNSVDIR_PID:-}" ] && kill -0 "$RUNSVDIR_PID" 2>/dev/null; then | |
| kill "$RUNSVDIR_PID" 2>/dev/null || true | |
| fi |
- Add dev.ci.setup-wikipedia task to download small Ray Charles ZIM (~2.7MB) - Add dev.ci.generate-map-data task to generate Monaco test data - Add --data-dir parameter to envs.docker.test for mounting CI data - Update Wikipedia tests to work with both small and full ZIM files - Add Map CI tests for Monaco data - Split CI workflows into one per site (only wikipedia enabled for testing) - Remove Wikipedia Dockerfile.ci (use normal build with data mount) - Store CI data in data/ directory at repo root Usage: inv dev.ci.setup-wikipedia inv envs.docker.build --site=wikipedia --tag=test inv envs.docker.test --site=wikipedia --tag=test --data-dir=data/wikipedia Co-Authored-By: Claude Opus 4.5 <[email protected]>
Port container start/stop functionality from dev/ to src/ and add new setup commands for Docker volume management. New CLI commands: - `env start/stop/status --site <name>` - Manage Docker containers - `env start --port/--env-ctrl-port` - Custom port mapping - `env setup init --site --data-dir` - Download data and create volumes - `env setup clean --site --force` - Remove Docker volumes New modules: - environments/container/ - ContainerManager, defaults, utilities - environments/setup/ - Volume setup orchestration, Docker operations Config changes: - Added ContainerConfig, ContainerSetupConfig, ContainerVolumeSpec types - Added optional `container` field to EnvironmentConfig Co-Authored-By: Claude Opus 4.5 <[email protected]>
| from webarena_verified.types.task import WebArenaSite | ||
|
|
||
| # Standard env-ctrl port used across all images | ||
| ENV_CTRL_CONTAINER_PORT = 8877 |
There was a problem hiding this comment.
remove global vars add 1 default per site.
| ENV_CTRL_CONTAINER_PORT = 8877 | ||
|
|
||
| # Volume name prefix for all WebArena containers | ||
| VOLUME_PREFIX = "webarena-verified" |
There was a problem hiding this comment.
use the python package name as prefix. No global vars
| >>> manager.stop() | ||
| """ | ||
|
|
||
| def __init__( |
There was a problem hiding this comment.
enforce key word args using *, for all functions
| env_ctrl_url = None | ||
|
|
||
| # Look for the web service port | ||
| container_port_key = f"{self.config.container_port}/tcp" |
There was a problem hiding this comment.
use the env ctrl client to get the status.
- Create ContainerBackend Protocol with DockerBackend implementation - Move container status types to types/container.py as Pydantic models - Move DEFAULT_CONTAINER_CONFIGS to environments/container/config.py - Use pre-computed volume names (webarena_verified_*) instead of suffix - Use keyword-only arguments (*,) throughout container APIs - Add hostname parameter to ContainerManager with default "localhost" - Simplify defaults.py to re-export from config.py Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Create backend/protocol.py with ContainerBackend Protocol - Create backend/docker.py with DockerBackend implementation - Create backend/__init__.py with re-exports and get_default_backend - Remove defaults.py, import directly from config.py Co-Authored-By: Claude Opus 4.5 <[email protected]>
Keep __all__ only in __init__.py files per convention. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The PatchManager class and patches directory are no longer needed as patching functionality has been moved to the container initialization process. This removes dead code and simplifies the codebase. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The health monitoring dashboard added complexity without providing sufficient value for local development workflows. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Enables automated testing when changes are made to these Docker environment sites or their integration tests. Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 174 out of 179 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if search_input.is_visible(timeout=5000): | ||
| search_input.fill("Ray Charles") | ||
| search_input.press("Enter") | ||
|
|
||
| # Wait for results | ||
| page.wait_for_load_state("networkidle", timeout=pw_timeout) | ||
|
|
||
| # Should show Ray Charles content | ||
| content = page.content().lower() | ||
| assert "ray" in content or "charles" in content, "Search should show Ray Charles content" |
There was a problem hiding this comment.
This test can pass without validating search behavior: if the search textbox is not visible, the test does nothing and still passes. Make the search box visibility an assertion (or fail fast) before performing the search so the test actually verifies the feature.
| def test_map_homepage_loads(map_container, map_base_url, page, pw_timeout): | ||
| """Test that the map homepage loads correctly.""" | ||
| page.goto(map_base_url) | ||
|
|
||
| # Wait for the map to be visible | ||
| expect(page.locator("#map")).to_be_visible(timeout=pw_timeout) |
There was a problem hiding this comment.
page.goto(...) is missing the timeout=pw_timeout used elsewhere in these integration tests. This increases flakiness (especially in CI) and makes the timeout configuration inconsistent across tests; pass timeout=pw_timeout here (and in the other goto calls in this file).
| # OSRM route request | ||
| route_url = f"{map_tile_url}/osrm/car/route/v1/driving/{start_lon},{start_lat};{end_lon},{end_lat}?overview=false" | ||
|
|
||
| req = urllib.request.Request(route_url, method="GET") | ||
| req.add_header("Accept", "application/json") |
There was a problem hiding this comment.
route_url uses /osrm/car/..., but the Apache proxy config for the map container exposes OSRM under /osrm/routed-car/ (see dev/environments/docker/sites/map/docker_overrides/tile-server.conf). As written, this will likely return 404 and (because of the broad HTTPError handling below) still be treated as success.
| try: | ||
| with urllib.request.urlopen(req, timeout=30) as response: | ||
| assert response.status == 200 | ||
| content = response.read().decode("utf-8") | ||
| # OSRM returns JSON with "code": "Ok" on success | ||
| assert '"code"' in content |
There was a problem hiding this comment.
This except urllib.error.HTTPError block allows any 4xx response (including 404 due to a wrong path) to pass the test without assertions. If 4xx responses are acceptable, assert explicitly on the expected codes/message; otherwise let the exception fail the test so a broken endpoint is caught.
| search_input = page.locator('input[name="query"]').first | ||
| if search_input.is_visible(timeout=5000): | ||
| search_input.fill("Monaco") | ||
| search_input.press("Enter") |
There was a problem hiding this comment.
This test can silently pass without exercising the search functionality: if the search input isn't visible, nothing is asserted. Prefer asserting that the search input is visible (or failing with a clear message) before using it, so the test actually verifies search works in CI.
| def map_container(): | ||
| """Map container name (assumes already running).""" | ||
| return "osm-website" |
There was a problem hiding this comment.
The map container name fixture returns osm-website, but the Docker Compose service/container name for Map in this PR is webarena-verified-map. This mismatch will confuse users and breaks any future tests/utilities that rely on the fixture; update it to the actual container name used by the tooling.
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: | ||
| s.bind(("", 0)) | ||
| s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
| return s.getsockname()[1] |
There was a problem hiding this comment.
SO_REUSEADDR is being set after bind(), which is ineffective on many platforms. Set the socket option before calling bind() (and consider calling listen() if the intent is to reserve the port until the function returns).
| parser.addoption( | ||
| "--map_tile_url", | ||
| action="store", | ||
| default=None, | ||
| help="Tile server URL for map site (e.g., http://localhost:8080)", | ||
| ) |
There was a problem hiding this comment.
--map_tile_url help/example uses http://localhost:8080, but the provided docker-compose.yml publishes the map container's Apache (tiles + proxies) on WA_MAP_PORT (default 3030) only. With the default compose setup, http://localhost:8080 won’t be reachable, so this example/option description should match the actual published port (likely the same as --map_url).
| __all__ = [ | ||
| "MAGENTO_ADMIN_AUTO_LOGIN_HEADER", | ||
| "SiteInstanceHandler", | ||
| # Re-export subpackages for convenient access | ||
| "container", | ||
| "env_ctrl_client", | ||
| "setup", | ||
| ] |
There was a problem hiding this comment.
__all__ includes "container", "env_ctrl_client", and "setup", but those names are not imported/defined in this module. from webarena_verified.environments import * will raise AttributeError for these entries. Either import the subpackages here (e.g., from . import container, env_ctrl_client, setup) or remove them from __all__ and adjust the comment about re-exporting.
Fixes from Copilot review: - Fix _get_service_url() to read port from env vars - Fix Shopping Admin port in READMEs (6680 -> 7780) - Add --map_env_ctrl_url option and add map to env-ctrl tests - Fix Wikipedia search test to assert visibility - Add timeout to map playwright tests - Remove overly broad HTTPError handling in map test - Fix gitlab entrypoint.sh set -u crash on early signal - Fix SO_REUSEADDR order in network_utils (before bind) - Fix race condition in container port allocation (let Docker assign) CI changes: - Enable test-docker-gitlab.yml workflow - Remove obsolete .disabled workflow files Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ation - Add env-ctrl init to gitlab entrypoint (was missing) - Change wait default to True, add --no-wait flag - Pass WA_ENV_CTRL_EXTERNAL_SITE_URL env var to docker run (no double init) - Replace _wait_and_configure with _wait_for_ready (just polls, no init call) - Add health_check_path to ContainerConfig for external URL polling - Remove duplicate volumes field from ContainerConfig (derive from setup.volumes) - Make port and env_ctrl_port mandatory in manager.start() - Accept 4xx responses as "site is up" in external URL polling - Fix ruff/ty issues Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add CLI examples for env start/stop/status commands - Keep Docker direct commands as alternative - Update port mappings to include env-ctrl ports Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Replace invoke/docker-compose examples with webarena-verified CLI - Keep Docker direct commands as alternative - Update Quick Start sections for all sites - Simplify data setup instructions for wikipedia and map Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* Add Docker site environments and integration tests - Add Dockerfiles and configuration for 6 sites: gitlab, map, reddit, shopping, shopping_admin, wikipedia - Add docker-compose.yml for orchestrating all services - Add integration tests with Playwright for each site - Add dev utilities for logging, git, network, and path operations - Add environment settings and tasks for building/managing containers - Move contributing code to dev directory Co-Authored-By: Claude Opus 4.5 <[email protected]> * Reorganize dev tasks and clean up structure - Split dev tasks into category files: code_tasks, data_tasks, docs_tasks, env_tasks - Move docker_build to top-level task in tasks.py - Move monitoring config to assets/environments/monitoring/ - Remove template-dependent tasks and dev/templates/ - Add Docker sites CI workflow Co-Authored-By: Claude Opus 4.5 <[email protected]> * Update environment Docker documentation - Fix site names from hyphens to underscores (shopping_admin not shopping-admin) - Update Available Sites table with all 6 sites including Map - Add Env-Ctrl ports column to tables - Fix image names to current convention (am1n3e/webarena-verified-<site>) - Update directory structure from contributing/ to dev/environments/ - Add Docker Compose quick start instructions - Add Data Management commands (data-download, setup) - Update Base Image Pipeline with correct script names - Add Environment Variables reference for Docker Compose Co-Authored-By: Claude Opus 4.5 <[email protected]> * Simplify invoke task names by removing redundant prefixes Renamed tasks to avoid namespace-prefixed names: - dev.docs.docs-serve → dev.docs.serve - dev.docs.docs-build → dev.docs.build - dev.docs.docs-deploy → dev.docs.deploy - dev.code.code-format-and-check → dev.code.format - dev.data.data-format → dev.data.format - dev.env.env-init → dev.env.init - demo.demo-gitlab-start → demo.gitlab-start - demo.demo-gitlab-stop → demo.gitlab-stop Co-Authored-By: Claude Opus 4.5 <[email protected]> * Add per-environment documentation files - Create dedicated doc pages for each environment (shopping_admin, shopping, reddit, gitlab, wikipedia, map) - Move shared Docker info to index.md (size improvements, env vars, commands) - Add announcement about Docker images availability to README - Update map.md to explain single-container optimization vs original 5 containers - Remove docker_images.md (content redistributed) Co-Authored-By: Claude Opus 4.5 <[email protected]> * Add CI tasks and reorganize dev README files - Add dev/ci_tasks.py for CI-related invoke tasks - Add Dockerfile.ci for Wikipedia environment - Move site README files from docker_overrides/ to sites/ level - Update GitHub workflow and gitignore - Update tasks.py imports Co-Authored-By: Claude Opus 4.5 <[email protected]> * Reorder Quick Start to prioritize uvx over Docker and pip Co-Authored-By: Claude Opus 4.5 <[email protected]> * Update README and disable Docker sites workflow - Add section showing how to run WebArena environments with docker run - Remove map NOTES.md - Disable test-docker-sites.yml workflow temporarily Co-Authored-By: Claude Opus 4.5 <[email protected]> * Add CI test data support for Wikipedia and Map sites - Add dev.ci.setup-wikipedia task to download small Ray Charles ZIM (~2.7MB) - Add dev.ci.generate-map-data task to generate Monaco test data - Add --data-dir parameter to envs.docker.test for mounting CI data - Update Wikipedia tests to work with both small and full ZIM files - Add Map CI tests for Monaco data - Split CI workflows into one per site (only wikipedia enabled for testing) - Remove Wikipedia Dockerfile.ci (use normal build with data mount) - Store CI data in data/ directory at repo root Usage: inv dev.ci.setup-wikipedia inv envs.docker.build --site=wikipedia --tag=test inv envs.docker.test --site=wikipedia --tag=test --data-dir=data/wikipedia Co-Authored-By: Claude Opus 4.5 <[email protected]> * Add container management and setup CLI commands Port container start/stop functionality from dev/ to src/ and add new setup commands for Docker volume management. New CLI commands: - `env start/stop/status --site <name>` - Manage Docker containers - `env start --port/--env-ctrl-port` - Custom port mapping - `env setup init --site --data-dir` - Download data and create volumes - `env setup clean --site --force` - Remove Docker volumes New modules: - environments/container/ - ContainerManager, defaults, utilities - environments/setup/ - Volume setup orchestration, Docker operations Config changes: - Added ContainerConfig, ContainerSetupConfig, ContainerVolumeSpec types - Added optional `container` field to EnvironmentConfig Co-Authored-By: Claude Opus 4.5 <[email protected]> * Refactor container management based on PR review - Create ContainerBackend Protocol with DockerBackend implementation - Move container status types to types/container.py as Pydantic models - Move DEFAULT_CONTAINER_CONFIGS to environments/container/config.py - Use pre-computed volume names (webarena_verified_*) instead of suffix - Use keyword-only arguments (*,) throughout container APIs - Add hostname parameter to ContainerManager with default "localhost" - Simplify defaults.py to re-export from config.py Co-Authored-By: Claude Opus 4.5 <[email protected]> * Split backend into folder with one file per class - Create backend/protocol.py with ContainerBackend Protocol - Create backend/docker.py with DockerBackend implementation - Create backend/__init__.py with re-exports and get_default_backend - Remove defaults.py, import directly from config.py Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove __all__ from non-__init__ files Keep __all__ only in __init__.py files per convention. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove patches directory and related infrastructure The PatchManager class and patches directory are no longer needed as patching functionality has been moved to the container initialization process. This removes dead code and simplifies the codebase. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Remove Gatus monitoring service The health monitoring dashboard added complexity without providing sufficient value for local development workflows. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Add CI workflows for shopping, shopping_admin, and reddit Enables automated testing when changes are made to these Docker environment sites or their integration tests. Co-Authored-By: Claude Opus 4.5 <[email protected]> * Address PR review comments and enable GitLab CI Fixes from Copilot review: - Fix _get_service_url() to read port from env vars - Fix Shopping Admin port in READMEs (6680 -> 7780) - Add --map_env_ctrl_url option and add map to env-ctrl tests - Fix Wikipedia search test to assert visibility - Add timeout to map playwright tests - Remove overly broad HTTPError handling in map test - Fix gitlab entrypoint.sh set -u crash on early signal - Fix SO_REUSEADDR order in network_utils (before bind) - Fix race condition in container port allocation (let Docker assign) CI changes: - Enable test-docker-gitlab.yml workflow - Remove obsolete .disabled workflow files Co-Authored-By: Claude Opus 4.5 <[email protected]> * Refactor container startup: simplify wait logic and fix config duplication - Add env-ctrl init to gitlab entrypoint (was missing) - Change wait default to True, add --no-wait flag - Pass WA_ENV_CTRL_EXTERNAL_SITE_URL env var to docker run (no double init) - Replace _wait_and_configure with _wait_for_ready (just polls, no init call) - Add health_check_path to ContainerConfig for external URL polling - Remove duplicate volumes field from ContainerConfig (derive from setup.volumes) - Make port and env_ctrl_port mandatory in manager.start() - Accept 4xx responses as "site is up" in external URL polling - Fix ruff/ty issues Co-Authored-By: Claude Opus 4.5 <[email protected]> * Update README with CLI commands for environment management - Add CLI examples for env start/stop/status commands - Keep Docker direct commands as alternative - Update port mappings to include env-ctrl ports Co-Authored-By: Claude Opus 4.5 <[email protected]> * Update environment docs with CLI commands - Replace invoke/docker-compose examples with webarena-verified CLI - Keep Docker direct commands as alternative - Update Quick Start sections for all sites - Simplify data setup instructions for wikipedia and map Co-Authored-By: Claude Opus 4.5 <[email protected]> * Updated entrypoint * Mark Map site as beta in announcement Co-Authored-By: Claude Opus 4.5 <[email protected]> --------- Co-authored-by: Claude Opus 4.5 <[email protected]>
Adds optimized Docker images to DockerHub for all WebArena sites, eliminating the need to run lengthy optimization scripts. Users can now simply pull pre-built images and start environments immediately.
Highlights
Added
Docker site configurations (
dev/environments/docker/sites/)Docker Compose (
docker-compose.yml)Integration tests (
tests/integration/environments/)Dev utilities (
dev/utils/)logging_utils- Rich console output with banners, spinners, tablesgit_utils,network_utils,path_utils- Helper functionsInvoke tasks
inv compose.up/down- Start/stop Docker Compose servicesinv envs.docker.start/stop/check- Manage individual containersinv envs.docker.build/pull/publish- Image managementinv envs.docker.test- Run integration testsinv envs.docker.create-base-img- Build optimized base imagesCI workflow (
.github/workflows/test-docker-sites.yml)Usage
Start all services (pulls from DockerHub)
Start specific site
Run integration tests