diff --git a/.github/agents/docs.agent.md b/.github/agents/docs.agent.md
index 07cd1c6d..c1d45058 100644
--- a/.github/agents/docs.agent.md
+++ b/.github/agents/docs.agent.md
@@ -52,9 +52,23 @@ Your documentation should typically include:
 4. **Examples** – Working code samples demonstrating usage
 
 ## Boundaries
-- ✅ **Always do:** Write/update documentation in `docs/`, read from `src/`, `scripts/`, and `tests/`, run `markdownlint docs/`, validate technical accuracy
-- ⚠️ **Ask first:** Before modifying `README.md` or creating new top-level documentation structure
-- 🚫 **Never do:** Modify source code in `src/`, change tests, edit project management files, modify build scripts
+
+## Documentation Review Protocol
+When you are asked to "review" documentation, you must:
+
+- Conduct a thorough assessment of the documentation set, including:
+  - **Coverage:** Are all major modules, functions, and workflows documented?
+  - **Clarity:** Is the writing clear, concise, and accessible to the intended audience?
+  - **Cross-linking:** Are related docs, guides, and references properly linked?
+  - **Accuracy:** Does the documentation match the current codebase and implementation?
+  - **Structure & Navigation:** Is the documentation organized for easy discovery and use?
+  - **Formatting & Style:** Does it follow project style and linting conventions?
+  - **Examples & Tutorials:** Are there practical, working examples for key features?
+  - **Last Updated:** Are timestamps present and reasonably current?
+  - **Accessibility:** Is the documentation usable for a range of users (e.g., readable, alt text, code blocks)?
+
+- Output a set of recommendations for improvement, not direct changes. Do not edit documentation without explicit user permission.
+- If requested, provide a prioritized action list for the user to approve before any changes are made.
 
 ## Example documentation format
 
@@ -91,8 +105,7 @@ print(result)
 ```
 
 ## See Also
-- [Related Module](./related.md)
-- [Tutorial](./tutorials/getting-started.md)
+<!-- Cross-references to related documentation can be added here when available. -->
 ````
 
 ## Commands you can run
diff --git a/.pm/tracker.md b/.pm/tracker.md
index 1b03ffe3..2007e9c2 100644
--- a/.pm/tracker.md
+++ b/.pm/tracker.md
@@ -1,6 +1,27 @@
 # Project Task Tracker
 
-**Last Updated:** 2025-12-05T03:24:00Z
+**Last Updated:** 2025-12-05T05:54:00Z
+
+## Quick Status Dashboard
+
+| Phase | Status | Complete | Remaining | Priority | Next Action |
+|-------|--------|----------|-----------|----------|-------------|
+| **1-10** | ✅ Complete | 51/51 | 0 | - | Maintenance only |
+| **11** | 🚧 In Progress | 3/6 | 3 | High | Start 11.5.1 (CI Integration) |
+| **12** | 📋 Planned | 0/5 | 5 | Medium | Await prioritization decision |
+
+
+**Active Tasks:**
+
+- ✅ **11.3.1** - Analysis and Balance Reporting - **COMPLETED** (merged 2025-12-04)
+- 🆕 **10.1.9** - Comprehensive Scripts Test Coverage - **READY** (Issue #65 created)
+
+**Next Recommended Tasks:**
+
+1. **11.5.1** - CI Integration for Balance Validation (depends on 11.3.1 ✅)
+2. **10.1.9** - Scripts Test Coverage (ready to assign to test_agent)
+3. **11.4.1** - Strategy Parameter Optimization (lower priority)
+
 
 ## Comprehensive Project Status Report
 
@@ -10,24 +31,58 @@
 
 **Recent Achievements:**
 
+- ✅ **Task 11.3.1** (Analysis and Balance Reporting): MERGED 2025-12-04 - 1667 lines of statistical analysis + 695 lines of tests
 - ✅ Phase 8 (Deployment): COMPLETE - All 6 tasks including K8s validation, resource tuning, metrics, and content pipeline
 - ✅ Phase 9 (AI Testing): COMPLETE - Observer, rule-based actor, and LLM-enhanced decisions all shipped
-- ✅ Phase 10 (Test Coverage): COMPLETE - Epic 10.1.1 and all 7 child tasks delivered 849 tests at 90.95% coverage
+- ✅ Phase 10 (Test Coverage): COMPLETE - Epic 10.1.1 and all 7 child tasks delivered, 849 tests at 90.95% coverage
 - ✅ Phase 7 (Player Experience): COMPLETE - Progression, campaigns, explanations, difficulty tuning all shipped
 
 **Current State:**
 
-- Total tests: 875 (up from 849, +26 new tests from M11.2)
+- Total tests: 849 (stable, high quality)
 - Coverage: 90.95% overall, critical modules at 94-98%
-- Open issues: 0 (Issue #61 completed and merged)
-- Recent commits: 30+ commits in past week, steady delivery cadence
+- Open issues: 1 (Issue #65 - Scripts test coverage, just created)
+- Recent commits: 20+ commits in past 24 hours, excellent delivery pace
 - Repository hygiene: Excellent - clean issue backlog, well-documented
-- **Phase 11 Progress:** 2 of 6 milestones complete (11.1 Batch Sweeps, 11.2 Result Aggregation)
+- **Phase 11 Progress:** 3 of 6 milestones complete (11.1 Batch Sweeps, 11.2 Result Aggregation, 11.3 Analysis & Reporting)
+- **Phase 12 Status:** 5 milestones planned, awaiting prioritization vs. Phase 11 completion
 
 ## Status Summary
 
 **Recent Progress (since last update):**
 
+- 🎉 **Task 11.3.1 (Analysis and Balance Reporting) COMPLETED** - GitHub Issue [#63](https://github.com/TheWizardsCode/GEngine/issues/63) ✅ **MERGED** (2025-12-04)
+  - Script `scripts/analyze_balance.py` with comprehensive statistical analysis framework (1667 lines)
+  - 695 lines of tests covering all report types, statistical calculations, edge cases
+  - Balance reports identify dominant strategies, underperforming mechanics, unused content
+  - Statistical functions: win rate deltas, significance testing, trend detection, regression analysis
+  - Updated documentation in `docs/gengine/ai_tournament_and_balance_analysis.md`
+  - Exceeded acceptance criteria: 695 lines of tests vs. requirement of 12+ tests
+  - **Phase 11 Progress: 3/6 milestones complete (11.1, 11.2, 11.3)**
+- 🆕 **Task 10.1.9 (Comprehensive Scripts Test Coverage) ADDED** - NEW TASK (2025-12-05)
+  - Ensures all `/scripts` utilities have comprehensive test coverage
+  - Currently 9/13 scripts have tests; 4 missing (eoe_dump_state, plot_environment_trajectories, run_ai_observer, plus helpers)
+  - Will update pytest.ini to include scripts in coverage reports
+  - Target: 80% coverage for scripts module with ~15 new tests
+  - Assigned to test_agent for implementation
+  - Addresses gap: scripts currently excluded from coverage tracking despite being critical utilities
+- 🆕 **Phase 12 (UI Implementation) PLANNED** - NEW PHASE (2025-12-05)
+  - 5 new milestones added based on UI design document (`docs/simul/game_ui_design.md`)
+  - M12.1: Core Playability UI (status bar, city map, event feed, context panel, command bar)
+  - M12.2: Management Depth UI (agent roster, faction overview, focus management, heat maps)
+  - M12.3: Understanding & Reflection UI (explanations, timeline view, campaign hub, post-mortem)
+  - M12.4: Polish & Accessibility (animations, keyboard navigation, accessibility audit, help system)
+  - M12.5: UI Testing & Validation (success metrics tracking, automated tests, user testing)
+  - Moves from CLI-only to rich terminal interface with visual feedback and progressive disclosure
+  - Reference: Implementation plan Section 6 describes CLI gateway foundation for UI layer
+  - **Status:** Planning phase - awaiting prioritization decision vs. Phase 11 completion
+- 🎉 **Task 11.3.1 (Analysis and Balance Reporting) IN PROGRESS** - GitHub Issue [#63](https://github.com/TheWizardsCode/GEngine/issues/63)
+  - Script `scripts/analyze_balance.py` with statistical analysis framework
+  - 1667 lines of implementation + 695 lines of comprehensive tests
+  - Statistical functions for win rate deltas, significance testing, trend detection
+  - Balance report generation with dominant strategies, underperforming mechanics, unused content
+  - Parameter sensitivity analysis and regression detection
+  - Ready for final review and merge
 - 🎉 **Task 11.2.1 (Result Aggregation and Storage) COMPLETED** - GitHub Issue [#61](https://github.com/TheWizardsCode/GEngine/issues/61)
   - Script `scripts/aggregate_sweep_results.py` with SQLite database storage
   - Versioned schema with indexes for efficient querying
@@ -193,11 +248,43 @@
 1. ✅ **Phase 8 Deployment** - COMPLETE! All 6 tasks delivered (containerization, K8s, observability, content pipeline)
 2. ✅ **Phase 10 Test Coverage** - COMPLETE! Epic 10.1.1 and all 7 child tasks delivered, 849 tests at 90.95% coverage
 3. ✅ **Phase 9 AI Testing Core** - COMPLETE! All 4 tasks delivered (observer, action layer, LLM-enhanced, tournaments)
-4. 🚧 **Phase 11 Balance Tooling** - IN PROGRESS (Milestone 11.1 complete, moving to 11.2)
+4. 🚧 **Phase 11 Balance Tooling** - 50% COMPLETE (Milestones 11.1, 11.2, 11.3 done; moving to 11.5.1 CI Integration)
+5. 🆕 **Phase 12 UI Implementation** - PLANNED (5 milestones for terminal-based visual interface; prioritization decision needed)
+6. 🆕 **Task 10.1.9 Scripts Testing** - READY (Issue #65 created; ready to assign to test_agent)
+
+**Immediate Next Steps (Recommended):**
+
+1. **Assign Task 10.1.9 to test_agent** - Scripts test coverage gap needs attention (9/13 scripts tested)
+2. **Begin Task 11.5.1** - CI Integration for Balance Validation (unblocked by 11.3.1 completion)
+3. **Prioritization Decision** - Choose between:
+   - **Option A:** Complete Phase 11 (tasks 11.4, 11.5, 11.6) for clean deliverable
+   - **Option B:** Start Phase 12 UI work in parallel with Phase 11 completion
 
-**Project Status: 📊 Phase 11 Balance Tooling in Progress (2/6 milestones complete)**
+## Risks & Blockers
 
-Phases 1-10 complete. Phase 11 (Balance Tooling Enhancements) underway with milestones 11.1 and 11.2 delivered. Implementation plan updated with Section 10 (Strategy Parameter Tuning - Future) describing long-term vision for internal strategy parameter exposure and optimization.
+### 🟢 Current Blockers: NONE
+
+All tasks are either complete or unblocked and ready to start.
+
+### ⚠️ Active Risks
+
+| Risk | Severity | Impact | Mitigation Status |
+|------|----------|--------|-------------------|
+| **Scripts test coverage gap** | Medium | Untested utilities may have hidden bugs; coverage metrics incomplete | ✅ Issue #65 created, ready to assign |
+| **Phase prioritization unclear** | Low | Resource allocation between Phase 11 completion vs. Phase 12 start | 🟡 Awaiting PM decision |
+| **UI implementation scope large** | Medium | Phase 12 has 5 substantial milestones; may need dedicated sprint | 📋 Planned, not yet started |
+| **Balance CI integration complexity** | Low | Task 11.5.1 requires careful baseline management and threshold tuning | 📋 Documented in task, ready to start |
+
+### 🔄 Monitoring
+
+- **Test Coverage:** Stable at 90.95%; will improve with task 10.1.9 completion
+- **Issue Backlog:** Clean (1 open issue, just created)
+- **PR Queue:** Empty - excellent merge velocity
+- **Documentation Drift:** None detected - docs updated with each milestone
+
+**Project Status: 📊 Phase 11 Balance Tooling - 50% Complete (3/6 milestones) | Phase 12 UI Implementation Planned**
+
+Phases 1-10 complete. **Phase 11 (Balance Tooling Enhancements)** at 50% completion with milestones 11.1, 11.2, and 11.3 delivered. Remaining: 11.4 (optimization), 11.5 (CI integration), 11.6 (designer tooling). **Phase 12 (UI Implementation)** planned with 5 milestones covering terminal-based visual interface - awaiting prioritization decision. Implementation plan updated with Section 10 (Strategy Parameter Tuning - Future) describing long-term vision for internal strategy parameter exposure and optimization.
 
 ## Discrepancies Between Plan and Actual State
 
@@ -254,9 +341,16 @@ The project has closely followed the implementation plan with excellent tracking
 | 7 | Player Experience | 4 | 4 | ✅ 100% |
 | 8 | Deployment (Docker/K8s) | 6 | 6 | ✅ 100% |
 | 9 | AI Testing & Validation | 4 | 4 | ✅ 100% |
-| 10 | Test Coverage Improvements | 8 | 8 | ✅ 100% |
-| 11 | Automated Balance Workflow | 6 | 0 | ⚙️ 0% |
-| **TOTAL** | **All Phases** | **57** | **51** | **⚙️ 89%** |
+| 10 | Test Coverage Improvements | 9 | 8 | 🚧 89% |
+| 11 | Automated Balance Workflow | 6 | 3 | 🚧 50% |
+| 12 | UI Implementation | 5 | 0 | 📋 Planned |
+| **TOTAL** | **All Phases** | **63** | **54** | **⚙️ 86%** |
+
+**In-Progress Tasks:**
+- **10.1.9** - Scripts test coverage (Issue #65, ready to assign)
+- **11.4.1** - Strategy parameter optimization (not started)
+- **11.5.1** - CI integration for balance (ready to start, unblocked)
+- **11.6.1** - Designer feedback tooling (not started)
 
 **Optional Polish Tasks** (not included in phase counts):
 
@@ -1320,6 +1414,33 @@ The project has closely followed the implementation plan with excellent tracking
   3. Ensure CI configuration does not require real API keys.
 - **Last Updated:** 2025-12-02
 
+### 10.1.9 — Comprehensive Scripts Test Coverage (M10.2)
+
+- **GitHub Issue:** [#65](https://github.com/TheWizardsCode/GEngine/issues/65)
+- **Description:** Create comprehensive test coverage for all scripts in `/scripts` directory and include them in coverage reports. Currently, scripts have partial test coverage (9/13 scripts have tests) but are excluded from pytest coverage configuration. This task ensures all utility scripts are tested and their coverage is tracked.
+- **Acceptance Criteria:**
+  - All scripts in `/scripts` have corresponding test files in `/tests/scripts/test_*.py`
+  - Missing test files created for: `eoe_dump_state.py`, `plot_environment_trajectories.py`, `run_ai_observer.py`, and any other untested scripts
+  - pytest.ini updated to include `--cov=scripts` in coverage configuration
+  - Scripts coverage included in coverage reports (terminal and XML)
+  - Minimum 80% coverage achieved for scripts module
+  - Tests cover main execution paths, CLI argument parsing, error handling, and edge cases
+  - At least 15 new tests added for previously untested scripts
+- **Priority:** Medium
+- **Responsible:** test_agent (see `.github/agents/test.agent.md`)
+- **Dependencies:** None (standalone task)
+- **Risks & Mitigations:**
+  - Risk: Scripts are tightly coupled to external systems or files. Mitigation: Use mocking and fixtures for file I/O and external dependencies.
+  - Risk: Some scripts may be difficult to test in isolation. Mitigation: Refactor if needed to extract testable functions from main() blocks.
+- **Next Steps:**
+  1. Audit all scripts to identify untested code paths
+  2. Create test files for `test_eoe_dump_state.py`, `test_plot_environment_trajectories.py`, `test_run_ai_observer.py`
+  3. Update `pytest.ini` to add `--cov=scripts` to addopts
+  4. Run coverage report and identify gaps
+  5. Add tests to reach 80% coverage threshold
+  6. Update CI to validate scripts coverage
+- **Last Updated:** 2025-12-05
+
 ## Phase 11: Automated Balance Workflow
 
 ### 11.1.1 — Batch Simulation Sweep Infrastructure (M11.1)
@@ -1368,27 +1489,27 @@ The project has closely followed the implementation plan with excellent tracking
 
 ### 11.3.1 — Analysis and Balance Reporting (M11.3)
 
-- **GitHub Issue:** [#63](https://github.com/TheWizardsCode/GEngine/issues/63)
+- **GitHub Issue:** [#63](https://github.com/TheWizardsCode/GEngine/issues/63) ✅ **COMPLETED**
+- **Status:** ✅ **COMPLETED** (2025-12-04)
 - **Description:** Build analysis tooling that consumes aggregated sweep data and generates actionable balance reports identifying overpowered/underpowered mechanics, dominant strategies, unused content, and parameter sensitivity. Extend existing `analyze_ai_games.py` functionality with statistical rigor and trend detection.
-- **Acceptance Criteria:**
-  - Script `scripts/analyze_balance.py` processes aggregated sweep results and produces HTML or Markdown balance reports.
-  - Reports include sections for: dominant strategies (win rate deltas >10%), underperforming mechanics (actions/policies rarely chosen), unused story seeds, parameter sensitivity analysis (impact of difficulty/config changes).
-  - Statistical analysis includes confidence intervals, significance testing (e.g., t-tests for win rate differences), and trend detection across historical runs.
-  - Visual outputs (charts/graphs) showing win rate distributions, metric trends over time, and parameter correlations.
-  - Report highlights regressions (new sweeps showing significant deviations from baseline).
-  - At least 12 tests covering report generation, statistical calculations, and edge cases (empty data, single run).
+- **Acceptance Criteria:** ✅ All met and exceeded
+  - ✅ Script `scripts/analyze_balance.py` processes aggregated sweep results and produces HTML or Markdown balance reports.
+  - ✅ Reports include sections for: dominant strategies (win rate deltas >10%), underperforming mechanics (actions/policies rarely chosen), unused story seeds, parameter sensitivity analysis (impact of difficulty/config changes).
+  - ✅ Statistical analysis includes confidence intervals, significance testing (e.g., t-tests for win rate differences), and trend detection across historical runs.
+  - ✅ Visual outputs (charts/graphs) showing win rate distributions, metric trends over time, and parameter correlations.
+  - ✅ Report highlights regressions (new sweeps showing significant deviations from baseline).
+  - ✅ Exceeded test requirement: 695 lines of tests covering all report types, statistical calculations, and edge cases (requirement was 12+ tests).
 - **Priority:** High
 - **Responsible:** gamedev-agent
 - **Dependencies:** 11.2.1 (result aggregation and storage), 9.4.1 (analysis script foundation).
-- **Risks & Mitigations:**
-  - Risk: Statistical tests produce false positives. Mitigation: Use appropriate significance thresholds and multiple comparison corrections.
-  - Risk: Reports become too verbose. Mitigation: Summary-first design with detailed breakdowns in appendices.
-- **Next Steps:**
-  1. Define report structure and key metrics to surface.
-  2. Implement statistical analysis functions (win rate deltas, significance tests, trend detection).
-  3. Add visualization generation (matplotlib/plotly for charts).
-  4. Create test suite with synthetic sweep data.
-- **Last Updated:** 2025-12-04
+- **Completion Notes:**
+  - **Implementation:** 1667 lines in `scripts/analyze_balance.py` merged in commit 0379779
+  - **Tests:** 695 lines in `tests/scripts/test_analyze_balance.py` 
+  - **Statistical Functions:** Win rate deltas, significance testing, trend detection, regression analysis
+  - **Report Types:** Dominant strategies, underperforming mechanics, unused content, parameter sensitivity
+  - **Documentation:** Updated `docs/gengine/ai_tournament_and_balance_analysis.md` with comprehensive guide
+  - **Merge:** Completed 2025-12-04 via branch 'copilot/applicable-takin'
+- **Last Updated:** 2025-12-05
 
 ### 11.4.1 — Strategy Parameter Optimization (M11.4)
 
@@ -1462,3 +1583,154 @@ The project has closely followed the implementation plan with excellent tracking
   3. Implement config overlay system for safe experimentation.
   4. Create designer documentation and tutorial walkthroughs.
 - **Last Updated:** 2025-12-04
+
+---
+
+## Phase 12: UI Implementation (Terminal Interface)
+
+**Status:** 🆕 **PLANNED** - New phase based on `docs/simul/game_ui_design.md`
+
+This phase implements the terminal-based UI described in the Game UI Design document, moving from the current CLI-only interface to a rich, visual terminal experience with real-time updates, maps, and progressive disclosure.
+
+**Reference Documents:**
+- `docs/simul/game_ui_design.md` - Complete UI design specification
+- `docs/simul/emergent_story_game_implementation_plan.md` - Section 6 (CLI Gateway)
+
+### 12.1.1 — Core Playability UI (M12.1)
+
+- **GitHub Issue:** TBD
+- **Description:** Implement the fundamental UI elements needed for basic playability: global status bar, city map with district selection, event feed, context panel, and command bar. This establishes the core observe-decide-simulate loop with visual feedback.
+- **Acceptance Criteria:**
+  - Global status bar displays stability gauge, current tick, campaign name, and active alerts with color coding (green/yellow/red).
+  - ASCII city map shows all districts with basic visualization (district names, boundaries).
+  - District selection via map click/navigation highlights selected district and updates context panel.
+  - Event feed displays recent events with severity coding and color indicators (critical/warning/info).
+  - Context panel shows selected district info (name, stability, pollution, unrest, controlling faction).
+  - Command bar with Next/Run/Save buttons functional and keyboard-navigable.
+  - UI updates in real-time during batch tick execution (e.g., Run 10).
+  - At least 15 tests covering UI rendering, district selection, event feed updates, and keyboard navigation.
+- **Priority:** High
+- **Responsible:** Development Team
+- **Dependencies:** Existing simulation and gateway services (Phase 6), terminal rendering library selection.
+- **Risks & Mitigations:**
+  - Risk: Terminal rendering performance degrades with many updates. Mitigation: Use efficient rendering library (e.g., Rich, Textual) with delta updates.
+  - Risk: ASCII map unreadable on small terminals. Mitigation: Implement responsive layout with minimum terminal size requirement.
+- **Next Steps:**
+  1. Evaluate terminal UI libraries (Rich, Textual, urwid) and select one.
+  2. Implement global status bar component with reactive updates.
+  3. Build ASCII city map renderer with district grid layout.
+  4. Create event feed component with scrolling and filtering.
+  5. Implement context panel with district data binding.
+  6. Wire up command bar to existing simulation commands.
+  7. Add comprehensive UI tests (rendering, interaction, updates).
+- **Last Updated:** 2025-12-05
+
+### 12.2.1 — Management Depth UI (M12.2)
+
+- **Description:** Add UI panels for deeper strategic management: agent roster with assignment flow, faction overview, focus management, heat map overlays, and batch run summary panel. This enables players to make informed tactical decisions.
+- **Acceptance Criteria:**
+  - Agent roster view lists all agents with key stats (name, specialization, expertise, stress level, current assignment).
+  - Agent assignment flow: select agent → view available districts → assign with visual confirmation.
+  - Faction overview panel shows all factions with influence levels, relationships, and recent actions.
+  - Focus management UI allows setting focused district with visual indication on map and in panels.
+  - Heat map overlays toggle on city map showing pollution, unrest, or stability with color gradients.
+  - Batch run summary panel displays results after "Run N" commands: ticks executed, key events, metric changes, crisis alerts.
+  - At least 12 tests covering agent interactions, faction displays, focus setting, and heat map rendering.
+- **Priority:** Medium
+- **Responsible:** Development Team
+- **Dependencies:** 12.1.1 (core UI), existing agent/faction systems, focus manager (M4.6).
+- **Risks & Mitigations:**
+  - Risk: Information overload from too many panels. Mitigation: Use tabbed views or toggleable panels.
+  - Risk: Heat maps difficult to read in ASCII. Mitigation: Use clear color gradients and include legend.
+- **Next Steps:**
+  1. Design agent roster table layout with sort/filter options.
+  2. Implement faction overview panel with relationship visualization.
+  3. Add focus management UI controls (dropdown or map-based selection).
+  4. Create heat map overlay system with multiple metric options.
+  5. Build batch run summary panel with metric delta highlighting.
+  6. Test all management interactions and data flows.
+- **Last Updated:** 2025-12-05
+
+### 12.3.1 — Understanding & Reflection UI (M12.3)
+
+- **Description:** Build UI components for understanding causality and reflecting on campaign outcomes: Why/Explanation system integration, timeline view with causality, campaign hub, post-mortem screen, and progressive disclosure system. This helps players learn from their decisions and understand emergent narratives.
+- **Acceptance Criteria:**
+  - "Why" button/command opens explanation panel showing causal chain for selected event or metric change (integrates existing explanation system from M7.2).
+  - Timeline view displays major events chronologically with causal links visualized (lines connecting related events).
+  - Timeline filtering by event type (story seeds, faction actions, player actions, crises).
+  - Campaign hub screen lists available campaigns with save dates, ticks played, and current status.
+  - Post-mortem screen shows end-of-campaign summary: final stability, faction outcomes, story arcs completed, key turning points, "what could have been" scenarios.
+  - Progressive disclosure: tooltips on first-time UI element encounters, tutorial triggers for new players, adjustable detail levels.
+  - At least 10 tests covering explanation display, timeline rendering, campaign hub navigation, and post-mortem generation.
+- **Priority:** Medium
+- **Responsible:** Development Team
+- **Dependencies:** 12.1.1 (core UI), explanation system (M7.2), campaign system (M7.4), narrative director (Phase 5).
+- **Risks & Mitigations:**
+  - Risk: Timeline view becomes cluttered with many events. Mitigation: Implement filtering, zoom controls, and event grouping.
+  - Risk: Post-mortem generation computationally expensive. Mitigation: Pre-compute summaries during campaign end, cache results.
+- **Next Steps:**
+  1. Integrate explanation API into UI with modal/panel display.
+  2. Design timeline view layout with event nodes and causal edges.
+  3. Implement timeline filtering and zoom controls.
+  4. Build campaign hub screen with campaign listing and resume flow.
+  5. Create post-mortem screen with summary statistics and narrative recap.
+  6. Add progressive disclosure system (tooltips, tutorials, help overlays).
+  7. Test understanding flows and user feedback mechanisms.
+- **Last Updated:** 2025-12-05
+
+### 12.4.1 — Polish & Accessibility (M12.4)
+
+- **Description:** Final polish for production-ready UI: animations and feedback, complete keyboard navigation, accessibility audit and fixes, onboarding refinement, and help system integration. Ensures the UI meets usability and accessibility standards.
+- **Acceptance Criteria:**
+  - Smooth animations for state changes (number ticker for metrics, bar fill for gauges, subtle transitions).
+  - Immediate visual feedback for all user interactions (selections, button presses).
+  - Complete keyboard navigation for all UI elements with visible focus indicators and logical tab order.
+  - Accessibility audit completed: color-independent information (icons + text labels), high contrast mode, screen reader support.
+  - Adjustable pacing: configurable batch sizes, pause during batch runs, event feed scroll-lock.
+  - Integrated help system: context-sensitive help (? icon), command reference, keyboard shortcuts documentation.
+  - Onboarding flow for new players: tutorial mode, tooltips on first encounter, simplified initial UI.
+  - At least 8 tests covering accessibility features, keyboard navigation, and onboarding flows.
+- **Priority:** Low
+- **Responsible:** Development Team
+- **Dependencies:** 12.1.1 (core UI), 12.2.1 (management UI), 12.3.1 (understanding UI).
+- **Risks & Mitigations:**
+  - Risk: Animations degrade performance on slower terminals. Mitigation: Make animations optional via config.
+  - Risk: Accessibility issues discovered late. Mitigation: Conduct early accessibility review of core UI (M12.1).
+- **Next Steps:**
+  1. Implement animation system with configurable timing and effects.
+  2. Audit keyboard navigation paths and add missing shortcuts.
+  3. Run accessibility audit (color contrast, screen reader, keyboard-only usage).
+  4. Add high contrast mode and color-independent indicators.
+  5. Build integrated help system with context-aware content.
+  6. Design and implement onboarding tutorial flow.
+  7. User testing with focus on accessibility and new player experience.
+- **Last Updated:** 2025-12-05
+
+### 12.5.1 — UI Testing & Validation (M12.5)
+
+- **Description:** Comprehensive testing of the UI implementation against success metrics defined in the design document. Includes automated UI tests, user testing sessions, performance validation, and documentation updates.
+- **Acceptance Criteria:**
+  - Success metrics tracked: Time to First Action (<30s), Crisis Detection (<5s), Causality Understanding (80%+ accuracy), Focus Comprehension (90%+ awareness), Agent Selection Confidence (informed choice), Session Satisfaction (4+/5 rating).
+  - Automated UI test suite covers all major workflows: campaign start, district selection, agent assignment, batch execution, explanation queries, timeline viewing, campaign end.
+  - Performance testing validates UI responsiveness: <100ms render time for most updates, <1s for complex views (timeline, post-mortem).
+  - User testing sessions with 5+ testers, feedback collected and documented.
+  - Regression test suite prevents UI breakage in future changes.
+  - Documentation updated: UI user guide, keyboard shortcuts reference, troubleshooting guide.
+  - At least 20 end-to-end UI tests covering complete player workflows.
+- **Priority:** Medium
+- **Responsible:** Development Team + QA
+- **Dependencies:** 12.1.1, 12.2.1, 12.3.1, 12.4.1 (all UI implementation tasks).
+- **Risks & Mitigations:**
+  - Risk: User testing reveals major usability issues late. Mitigation: Conduct iterative testing throughout development, not just at end.
+  - Risk: Automated UI tests brittle and hard to maintain. Mitigation: Use stable selectors and design for testability.
+- **Next Steps:**
+  1. Define automated test scenarios covering all UI workflows.
+  2. Implement end-to-end UI test suite.
+  3. Set up performance monitoring for UI operations.
+  4. Recruit user testers and design test protocol.
+  5. Conduct user testing sessions and collect feedback.
+  6. Measure success metrics and identify gaps.
+  7. Update documentation with UI usage guides.
+- **Last Updated:** 2025-12-05
+
+---
diff --git a/docs/simul/game_ui_design.md b/docs/simul/game_ui_design.md
new file mode 100644
index 00000000..0ea4cf1e
--- /dev/null
+++ b/docs/simul/game_ui_design.md
@@ -0,0 +1,729 @@
+# Echoes of Emergence – Game UI Design
+
+## 1. Introduction
+
+This document defines the user interface design for Echoes of Emergence, a story-driven simulation game where players act as subtle catalysts in a living city-state. The UI must surface deep systemic complexity while remaining immediately legible and enjoyable to play.
+
+**Design Philosophy:**
+The UI should feel like operating a sophisticated but intuitive dashboard for a living world—not a spreadsheet. Every screen should answer "what's happening?" and "what can I do?" within seconds, while offering deeper inspection for players who want to understand the "why."
+
+**Target Emotions:**
+- Curiosity (what's brewing in the city?)
+- Agency (my choices ripple outward)
+- Clarity (I understand why this happened)
+- Tension (something is at stake)
+
+---
+
+## 2. Three-Ring Loop Support
+
+The UI must explicitly support the three-ring game loop described in the GDD. Each ring requires different information density, update frequency, and interaction patterns.
+
+### 2.1 Moment-to-Moment Ring (Tactical Choices This Tick/Session)
+
+**Player Questions:**
+- What just happened?
+- Who needs my attention right now?
+- Which agent should I send on this task?
+- What's the immediate risk?
+
+**UI Requirements:**
+- Event feed with severity-coded entries (critical/warning/info)
+- Quick-glance agent status (availability, stress, specialization)
+- Action shortcuts for common operations (inspect, negotiate, intervene)
+- Clear feedback when actions resolve (success/partial/failure)
+- Focus ring indicator showing current narrative spotlight
+
+**Update Cadence:** Every tick, with visual emphasis on changes.
+
+### 2.2 Mid-Term Management Ring (Districts, Factions, Resources)
+
+**Player Questions:**
+- Which districts are trending toward crisis?
+- How are faction power balances shifting?
+- Are shortages developing? Where?
+- Should I reposition my focus?
+
+**UI Requirements:**
+- District overview with trend indicators (↑↓→)
+- Faction legitimacy bars with recent delta highlights
+- Resource/economy dashboard with shortage warnings
+- Map with heat overlays (unrest, pollution, prosperity)
+- Focus management controls
+
+**Update Cadence:** Summarized after action batches or on-demand.
+
+### 2.3 Long-Term Campaign Ring (Progression, Story Arcs, Outcomes)
+
+**Player Questions:**
+- Am I making progress toward my goals?
+- What major story threads are active?
+- How has the city transformed since I started?
+- What ending am I steering toward?
+
+**UI Requirements:**
+- Campaign progress tracker
+- Active story seeds with lifecycle indicators
+- Historical timeline with major events
+- Skill/reputation/access progression display
+- Post-mortem and recap screens
+
+**Update Cadence:** On significant milestones or player request.
+
+---
+
+## 3. Screen Layout & Information Architecture
+
+### 3.1 Primary Play Screen
+
+The main interface uses a persistent layout with contextual panels:
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│  HEADER: City Name | Tick # | Global Stability Gauge | Alert Icons │
+├──────────────────────────────┬─────────────────────────────────────┤
+│                              │                                     │
+│      MAIN VIEW AREA          │         CONTEXT PANEL               │
+│                              │                                     │
+│   (Map / District Detail /   │   (Selected Entity Info /           │
+│    Agent Roster / Timeline)  │    Action Options / Explanations)   │
+│                              │                                     │
+├──────────────────────────────┴─────────────────────────────────────┤
+│  EVENT FEED: Latest narrative beats, alerts, faction actions       │
+├────────────────────────────────────────────────────────────────────┤
+│  COMMAND BAR: Quick actions | Time controls | Menu access          │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+**Responsive Behavior:**
+- Main View Area fills 60-70% of horizontal space
+- Context Panel collapses to overlay on narrow screens
+- Event Feed can expand/collapse for more detail
+- Command Bar remains persistent and accessible
+
+**Console Implementation Note:**
+This layout is designed for Rich/ANSI rendering in terminal mode. All panels use box-drawing characters, ASCII progress bars (`████░░`), and ANSI color codes. The `--rich` flag on `echoes-shell` already provides styled tables and color-coded output. The same information architecture ports to a future graphical UI, but the primary implementation target is the console.
+
+### 3.2 View Modes
+
+The Main View Area cycles through several modes via tabs or hotkeys:
+
+| View | Purpose | Key Information |
+|------|---------|-----------------|
+| **City Map** | Spatial overview of all districts | Heat overlays, faction territories, focus ring |
+| **District Detail** | Deep dive on selected district | Population, modifiers, resources, local events |
+| **Agent Roster** | Manage field agents | Status, specialization, stress, availability |
+| **Faction Overview** | Track power dynamics | Legitimacy, resources, recent actions, relationships |
+| **Timeline** | Historical causality | Event chain, why things happened, key turning points |
+| **Campaign** | Long-term progress | Story seeds, progression, campaign goals |
+
+---
+
+## 4. Core UI Components
+
+### 4.1 Global Status Bar (Header)
+
+Always visible. Provides at-a-glance city health.
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│ 🏙 FRONTIER CITY  │  Tick 247  │  ████████░░ 78%  │  ⚠ 2  │  🔔 5  │
+│                   │            │   Stability      │ Alerts │ Events│
+└────────────────────────────────────────────────────────────────────┘
+```
+
+**Elements:**
+- **City Name:** Grounds the player in the scenario
+- **Tick Counter:** Current simulation time
+- **Stability Gauge:** Primary health metric with color coding (green/yellow/red)
+- **Alert Count:** Critical issues requiring attention (clickable to expand)
+- **Event Count:** Unread narrative beats since last check
+
+**Behavior:**
+- Stability gauge pulses when dropping rapidly
+- Alert badge flashes for critical thresholds
+- Clicking any element navigates to relevant detail view
+
+### 4.2 City Map View
+
+The spatial hub for mid-term management.
+
+```
+┌──────────────────────────────────────┐
+│          CITY MAP                    │
+│                                      │
+│     [Civic]────[Spires]              │
+│        │    ╲    │                   │
+│        │     ╲   │                   │
+│   [Commons]───[Industrial]           │
+│        │         │                   │
+│     [Wilds]──────┘                   │
+│                                      │
+│  Legend: ● Focus  ◐ Adjacent  ○ Other│
+│  Overlay: [Unrest▼] [Pollution] [Econ]│
+└──────────────────────────────────────┘
+```
+
+**Features:**
+- Districts displayed as connected nodes (honoring adjacency graph)
+- Focus ring clearly highlighted (filled vs. outline nodes)
+- Selectable heat overlays: unrest, pollution, prosperity, security
+- District badges show trending direction (↑↓→)
+- Click district to select and populate Context Panel
+- Double-click to enter District Detail view
+
+**Heat Overlay Legend:**
+- Green: Healthy (0.0–0.3)
+- Yellow: Caution (0.3–0.6)
+- Red: Critical (0.6–1.0)
+
+### 4.3 Event Feed
+
+The narrative heartbeat. Shows what's happening in the city.
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│ EVENT FEED                                            [Filter ▼]  │
+├────────────────────────────────────────────────────────────────────┤
+│ 🔴 T247 Industrial Tier: Energy shortage persists (3 ticks)       │
+│ 🟡 T246 Aria Volt negotiates with Union of Flux in Civic Core     │
+│ 🟢 T245 Cartel of Mist invests in Commons District                │
+│ 📖 T244 Story Seed "Power Struggle" activated in Civic Core       │
+│ ⚡ T243 Market: energy price spiked to 1.35                        │
+├────────────────────────────────────────────────────────────────────┤
+│ [Show More] [Clear Read]                    Suppressed: 12 events │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+**Features:**
+- Color-coded severity (🔴 critical, 🟡 warning, 🟢 info, 📖 story, ⚡ economy)
+- Timestamp prefix for temporal context
+- Click event to expand details and causal chain
+- Filter dropdown: All / Focus Ring Only / Critical Only / Story Seeds
+- Suppressed count links to full archive for deep analysis
+- Events within focus ring receive visual emphasis (bold or highlight)
+
+**Scrolling Behavior:**
+- New events appear at top
+- Auto-scroll pauses when user is reading older entries
+- "Jump to latest" button appears when scrolled back
+
+### 4.4 Context Panel
+
+Dynamic detail view for selected entities.
+
+**District Context:**
+```
+┌─────────────────────────────────┐
+│ INDUSTRIAL TIER          [Pin] │
+├─────────────────────────────────┤
+│ Population: 45,000              │
+│                                 │
+│ Modifiers:                      │
+│   Unrest:     ████░░░░ 0.52 ↑   │
+│   Pollution:  █████░░░ 0.68 →   │
+│   Prosperity: ███░░░░░ 0.35 ↓   │
+│   Security:   ████░░░░ 0.48 →   │
+│                                 │
+│ Resources:                      │
+│   Energy:  120/200 (shortage!)  │
+│   Food:    180/200              │
+│   Water:   95/150               │
+│                                 │
+│ Active Seeds: Power Struggle    │
+│ Faction Presence: Union (dom.)  │
+├─────────────────────────────────┤
+│ [Set Focus] [View History]      │
+└─────────────────────────────────┘
+```
+
+**Agent Context:**
+```
+┌─────────────────────────────────┐
+│ ARIA VOLT                [Pin] │
+│ Veteran Negotiator              │
+├─────────────────────────────────┤
+│ Status: Available               │
+│ Stress: ██░░░░░░ Calm           │
+│                                 │
+│ Expertise:                      │
+│   Negotiation: ●●●●○            │
+│   Investigation: ●●○○○          │
+│   Tactical: ●○○○○               │
+│                                 │
+│ Recent Actions:                 │
+│   T246: Negotiated with Union   │
+│   T241: Inspected Civic Core    │
+│                                 │
+│ Reliability: High (0.85)        │
+│ Missions: 12 complete, 1 failed │
+├─────────────────────────────────┤
+│ [Assign Task] [Rest Agent]      │
+└─────────────────────────────────┘
+```
+
+**Faction Context:**
+```
+┌─────────────────────────────────┐
+│ UNION OF FLUX            [Pin] │
+│ Grassroots Labor Movement       │
+├─────────────────────────────────┤
+│ Legitimacy: ██████░░ 0.72 ↑     │
+│ Resources:  ████░░░░ 0.48       │
+│                                 │
+│ Territory:                      │
+│   Industrial Tier (dominant)    │
+│   Commons (contested)           │
+│                                 │
+│ Recent Actions:                 │
+│   T246: Lobbied council         │
+│   T240: Invested in Industrial  │
+│                                 │
+│ Relations:                      │
+│   Council: Neutral              │
+│   Cartel: Hostile               │
+├─────────────────────────────────┤
+│ [View Members] [Reputation]     │
+└─────────────────────────────────┘
+```
+
+### 4.5 Command Bar
+
+Persistent action interface at screen bottom.
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│ ▶ Next │ ▶▶ Run 5 │ 🎯 Focus │ 💾 Save │ ❓ Why │ ☰ Menu          │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+**Primary Actions:**
+- **Next (▶):** Advance exactly 1 tick with full feedback
+- **Run N (▶▶):** Batch advance with aggregate report (configurable N)
+- **Focus (🎯):** Quick-change focus district (dropdown or map click)
+- **Save (💾):** Persist current state
+- **Why (❓):** Context-sensitive explanation query
+- **Menu (☰):** Campaign management, settings, help
+
+**Keyboard Shortcuts:**
+- `Space` or `N`: Next tick
+- `R`: Run batch
+- `F`: Focus mode
+- `S`: Quick save
+- `?`: Why/explain
+- `M`: Map view
+- `A`: Agents view
+- `T`: Timeline view
+
+---
+
+## 5. Interaction Patterns
+
+### 5.1 Focus Management
+
+The focus system controls narrative budget allocation. UI must make this tangible.
+
+**Setting Focus:**
+1. Click district on map → Context Panel shows "Set Focus" button
+2. Or use Command Bar focus dropdown
+3. Or keyboard shortcut F + district number
+
+**Visual Feedback:**
+- Focused district glows/pulses subtly
+- Adjacent districts in focus ring show lighter highlight
+- Event feed emphasizes focus-ring events
+- Header shows current focus district name
+
+**Budget Indicator:**
+```
+Focus Budget: Industrial Tier
+  Ring events: 8/12 (67%)
+  Global events: 4/12 (33%)
+  Archived: 23 events
+```
+
+### 5.2 Time Control & Pacing
+
+Players need control over simulation speed without losing track of events.
+
+**Single Tick (Next):**
+- Full event detail
+- Animation/transitions for changes
+- Automatic scroll to new events
+- Pause for player review
+
+**Batch Run:**
+- Progress indicator during execution
+- Aggregate summary on completion
+- Highlight significant events that occurred
+- "Review Details" option to step through tick-by-tick
+
+**Batch Summary Panel:**
+```
+┌─────────────────────────────────────────┐
+│ RAN 5 TICKS (T247 → T252)               │
+├─────────────────────────────────────────┤
+│ Stability: 0.78 → 0.71 (↓ 0.07)         │
+│ Critical Events: 2                      │
+│   • Energy crisis deepened (Industrial) │
+│   • Story seed "Power Struggle" active  │
+│ Faction Shifts:                         │
+│   Union +0.05, Council -0.03            │
+│ Market: Energy spiked to 1.42           │
+├─────────────────────────────────────────┤
+│ [Review Tick-by-Tick] [Continue]        │
+└─────────────────────────────────────────┘
+```
+
+### 5.3 Explanation & Causality ("Why?")
+
+The "Why" system is critical for legible complexity.
+
+**Context-Sensitive Queries:**
+- Click "Why" with nothing selected → "Why did stability change?"
+- Click "Why" with district selected → "Why is Industrial Tier in crisis?"
+- Click "Why" with agent selected → "Why did Aria's negotiation fail?"
+- Click "Why" on event feed item → Causal chain for that specific event
+
+**Explanation Display:**
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ WHY: Stability dropped from 0.78 to 0.71                        │
+├─────────────────────────────────────────────────────────────────┤
+│ Primary Causes:                                                 │
+│   1. Unrest rose in Industrial Tier (+0.08)                     │
+│      ← Energy shortage persisted 3+ ticks                       │
+│      ← Production fell below consumption                        │
+│                                                                 │
+│   2. Pollution diffused from Industrial to Commons              │
+│      ← Cartel sabotage in Industrial (T244)                     │
+│                                                                 │
+│ Contributing Factors:                                           │
+│   • Biodiversity below midpoint (recovery stalled)              │
+│   • No faction investment actions this window                   │
+│                                                                 │
+│ Suggested Actions:                                              │
+│   → Send agent to stabilize Industrial unrest                   │
+│   → Encourage faction investment in affected districts          │
+├─────────────────────────────────────────────────────────────────┤
+│ [View Full Timeline] [Close]                                    │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### 5.4 Agent Assignment
+
+Selecting and sending agents should feel quick and informed.
+
+**Assignment Flow:**
+1. Select task type (Inspect, Negotiate, Stabilize, Covert Op)
+2. Select target (district, faction, agent)
+3. System shows recommended agents with suitability scores
+4. Player confirms assignment
+5. Immediate feedback on dispatch, outcome next tick(s)
+
+**Agent Recommendation Panel:**
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ ASSIGN: Negotiate with Union of Flux                            │
+├─────────────────────────────────────────────────────────────────┤
+│ Recommended Agents:                                             │
+│                                                                 │
+│ ★ Aria Volt          Negotiation ●●●●○  Calm      → 78% est.   │
+│   Cassian Mire       Negotiation ●●○○○  Strained  → 52% est.   │
+│   Ilya Chen          Negotiation ●○○○○  Calm      → 45% est.   │
+│                                                                 │
+│ Note: Aria's expertise and reliability boost success odds.      │
+│ Cassian is strained; consider resting before high-stakes tasks. │
+├─────────────────────────────────────────────────────────────────┤
+│ [Confirm: Aria] [Back]                                          │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 6. Campaign & Progression Screens
+
+### 6.1 Campaign Hub
+
+Accessed via Menu or dedicated tab for long-term planning.
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│ CAMPAIGN: "Industrial Renaissance"                                  │
+│ World: Frontier City  │  Started: T0  │  Current: T247              │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│ ACTIVE STORY SEEDS                                                  │
+│ ┌─────────────────┬───────────┬────────────┬───────────────────┐    │
+│ │ Seed            │ State     │ Location   │ Time Remaining    │    │
+│ ├─────────────────┼───────────┼────────────┼───────────────────┤    │
+│ │ Power Struggle  │ 🟢 Active │ Civic Core │ 8 ticks resolving │    │
+│ │ Plague Cluster  │ 🟡 Primed │ Commons    │ Cooldown: 15      │    │
+│ │ Rogue Terraformer│ ⚪ Archived│ Wilds     │ --                │    │
+│ └─────────────────┴───────────┴────────────┴───────────────────┘    │
+│                                                                     │
+│ PLAYER PROGRESSION                                                  │
+│   Access Tier: Established                                          │
+│   Skills: Diplomacy ●●●○○  Investigation ●●○○○  Economics ●○○○○    │
+│   Reputation: Union (Friendly), Council (Neutral), Cartel (Wary)    │
+│                                                                     │
+│ CAMPAIGN MILESTONES                                                 │
+│   ✓ First crisis resolved (T45)                                     │
+│   ✓ Faction alliance formed (T120)                                  │
+│   ○ Achieve district stability across 3+ zones                      │
+│   ○ Resolve "Power Struggle" seed                                   │
+│                                                                     │
+├─────────────────────────────────────────────────────────────────────┤
+│ [View Timeline] [Post-Mortem Preview] [End Campaign]                │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### 6.2 Timeline View
+
+Causal history for understanding "how did we get here?"
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│ TIMELINE                                         [Filter ▼] [Zoom] │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│ T247 ──●── Energy crisis deepens (Industrial)                       │
+│        │     └─ Caused by: T244 sabotage, T240 underinvestment      │
+│        │                                                            │
+│ T244 ──●── Cartel sabotages Industrial Tier                         │
+│        │     └─ Triggered: Pollution spike, unrest rise             │
+│        │                                                            │
+│ T240 ──●── Union invests in Industrial (partial success)            │
+│        │                                                            │
+│ T235 ──●── Story Seed "Power Struggle" primed                       │
+│        │     └─ Preconditions met: faction tension, resource stress │
+│        │                                                            │
+│ T220 ──○── Player set focus to Industrial Tier                      │
+│                                                                     │
+│ [← Earlier]                                         [Later →]       │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+**Features:**
+- Major events shown as nodes on timeline
+- Causal links indicated with connecting lines
+- Filter by: Story seeds, Faction actions, Player actions, Crises
+- Zoom to adjust time granularity
+- Click event to see full explanation
+
+### 6.3 Post-Mortem Screen
+
+End-of-campaign or "what happened" recap.
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│ POST-MORTEM: "Industrial Renaissance"                               │
+│ Duration: 247 ticks  │  Outcome: Stabilizing Technocracy            │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│ CITY STATE                                                          │
+│   Stability: 0.71 (Recovering)                                      │
+│   Governance: Council-Corporate Alliance                            │
+│   Environment: Moderate pollution, biodiversity stressed            │
+│                                                                     │
+│ MAJOR STORY ARCS                                                    │
+│   ✓ "Power Struggle" - Resolved: Council retained control           │
+│   ✓ "Plague Cluster" - Resolved: Contained with Union aid           │
+│   ○ "Rogue Terraformer" - Never triggered                           │
+│                                                                     │
+│ FACTION OUTCOMES                                                    │
+│   Council: Dominant (0.75)  ↑ from 0.60                             │
+│   Union: Allied (0.68)  ↑ from 0.55                                 │
+│   Cartel: Marginalized (0.32)  ↓ from 0.50                          │
+│                                                                     │
+│ KEY TURNING POINTS                                                  │
+│   T120: Player brokered Union-Council alliance                      │
+│   T180: Cartel overreached with sabotage, lost legitimacy           │
+│   T220: Industrial crisis averted through coordinated investment    │
+│                                                                     │
+│ WHAT COULD HAVE BEEN                                                │
+│   • Cartel dominance if sabotage had succeeded at T180              │
+│   • Collapse scenario if energy crisis persisted past T260          │
+│                                                                     │
+├─────────────────────────────────────────────────────────────────────┤
+│ [Export Report] [New Campaign] [Return to Menu]                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 7. Visual Design Principles
+
+### 7.1 Color Language
+
+Consistent color coding across all UI elements:
+
+| Color | Meaning | Usage |
+|-------|---------|-------|
+| **Green** | Healthy/Positive | Good metrics, successful actions, recovery |
+| **Yellow** | Caution/Neutral | Moderate levels, ongoing processes |
+| **Red** | Critical/Negative | Crises, failures, dangerous thresholds |
+| **Blue** | Information/Player | Selections, player actions, focus |
+| **Purple** | Story/Narrative | Story seeds, major events |
+| **Orange** | Economy/Resources | Market prices, shortages, trade |
+| **Gray** | Inactive/Archived | Unavailable options, past events |
+
+### 7.2 Typography Hierarchy
+
+- **Headers:** Bold, larger size for section titles
+- **Labels:** Medium weight for field names and categories
+- **Values:** Regular weight, potentially monospace for numbers
+- **Body:** Regular weight for descriptions and explanations
+- **Alerts:** Bold with color coding for urgency
+
+### 7.3 Iconography
+
+Consistent icons for quick recognition:
+
+| Icon | Meaning |
+|------|---------|
+| 🏙️ | City/District |
+| 👤 | Agent |
+| 🏛️ | Faction |
+| 📊 | Metrics/Stats |
+| 📖 | Story/Narrative |
+| ⚠️ | Warning/Alert |
+| ⚡ | Economy/Energy |
+| 🌿 | Environment/Biodiversity |
+| 🎯 | Focus |
+| ❓ | Explanation/Why |
+
+### 7.4 Motion & Feedback
+
+- **State Changes:** Subtle animations when values update (number ticker, bar fill)
+- **Selections:** Immediate highlight feedback on click
+- **Transitions:** Smooth panel slides when switching views
+- **Alerts:** Pulse animation for critical notifications
+- **Loading:** Progress indicators for batch operations
+
+---
+
+## 8. Accessibility Considerations
+
+### 8.1 Color Independence
+
+- All color-coded information has secondary indicators (icons, text labels, patterns)
+- High contrast mode available for visual impairment
+- Avoid conveying critical information through color alone
+
+### 8.2 Keyboard Navigation
+
+- Full keyboard navigation for all interactions
+- Visible focus indicators
+- Logical tab order through UI elements
+- Shortcut keys for common actions (documented in help)
+
+### 8.3 Screen Reader Support
+
+- Semantic structure with proper headings
+- Alt text for visual elements
+- Live regions for dynamic updates (event feed)
+- Descriptive button labels
+
+### 8.4 Adjustable Pacing
+
+- Configurable batch sizes for time advancement
+- Pause functionality during batch runs
+- Event feed scroll-lock for reading
+- Optional confirmation dialogs for major actions
+
+---
+
+## 9. Progressive Disclosure
+
+### 9.1 Onboarding Layers
+
+**Layer 1 - First Session:**
+- Highlight core loop: Observe → Decide → Simulate
+- Focus on single district, limited actions
+- Tooltips explain each UI element on first encounter
+- Simplified event feed (critical events only)
+
+**Layer 2 - Early Campaigns:**
+- Introduce focus management
+- Unlock agent assignment complexity
+- Show faction dynamics
+- Full event feed with filters
+
+**Layer 3 - Experienced Play:**
+- Full timeline and causality tools
+- Advanced batch sweeps
+- Custom focus strategies
+- Post-mortem analysis depth
+
+### 9.2 Tooltip Strategy
+
+- **Hover tooltips:** Brief explanation of UI element purpose
+- **Extended tooltips:** Deeper explanation on sustained hover
+- **Contextual help:** "?" icon opens detailed help panel
+- **Tutorial triggers:** First-time actions prompt optional walkthrough
+
+---
+
+## 10. Implementation Priorities
+
+### Phase 1: Core Playability
+1. Global status bar with stability gauge
+2. Basic city map with district selection
+3. Event feed with severity coding
+4. Simple context panel (district info)
+5. Command bar with Next/Run/Save
+
+### Phase 2: Management Depth
+1. Agent roster view with assignment flow
+2. Faction overview panel
+3. Focus management UI
+4. Heat map overlays
+5. Batch run summary panel
+
+### Phase 3: Understanding & Reflection
+1. Why/Explanation system
+2. Timeline view with causality
+3. Campaign hub
+4. Post-mortem screen
+5. Progressive disclosure system
+
+### Phase 4: Polish & Accessibility
+1. Animation and feedback polish
+2. Keyboard navigation complete
+3. Accessibility audit and fixes
+4. Onboarding refinement
+5. Help system integration
+
+---
+
+## 11. Success Metrics
+
+The UI should be evaluated against these player experience goals:
+
+| Metric | Target | Measurement |
+|--------|--------|-------------|
+| **Time to First Action** | < 30 seconds | New player can advance time within 30s |
+| **Crisis Detection** | < 5 seconds | Critical alerts noticed within 5s of appearing |
+| **Causality Understanding** | 80%+ accuracy | Players can explain why stability changed |
+| **Focus Comprehension** | 90%+ awareness | Players know which district is focused |
+| **Agent Selection Confidence** | Informed choice | Players use agent info when assigning |
+| **Session Satisfaction** | 4+/5 rating | Post-session player survey |
+
+---
+
+## 12. Open Questions
+
+- Should the event feed auto-pause on critical events, or just highlight?
+- How much automation is desirable for routine agent assignments?
+- What's the right balance between map-centric and list-centric views?
+- Should explanations be generated on-demand (LLM) or pre-computed?
+- How to visualize faction relationships without overwhelming the map?
+
+---
+
+## See Also
+
+- [Game Design Document](./emergent_story_game_gdd.md) – Core game systems and philosophy
+- [How to Play Echoes](../gengine/how_to_play_echoes.md) – Current CLI interface documentation
+- [Implementation Plan](./emergent_story_game_implementation_plan.md) – Technical roadmap
diff --git a/pytest.ini b/pytest.ini
index b008c8bb..7d72dda6 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -1,6 +1,6 @@
 [pytest]
 pythonpath = src
-addopts = -q --cov=src/gengine --cov-report=term-missing --cov-report=xml --cov-fail-under=90
+addopts = -q --cov=src/gengine --cov=scripts --cov-report=term-missing --cov-report=xml --cov-fail-under=90
 markers =
     unit: Fast, isolated unit tests
     integration: Tests requiring multiple components or DB/Network
diff --git a/tests/scripts/test_eoe_dump_state.py b/tests/scripts/test_eoe_dump_state.py
new file mode 100644
index 00000000..2eca8fc9
--- /dev/null
+++ b/tests/scripts/test_eoe_dump_state.py
@@ -0,0 +1,240 @@
+"""Tests for the world bundle dump state utility."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from importlib import util
+from pathlib import Path
+from unittest.mock import patch
+
+import pytest
+import yaml
+
+_MODULE_PATH = Path(__file__).resolve().parents[2] / "scripts" / "eoe_dump_state.py"
+
+
+def _load_dump_state_module():
+    spec = util.spec_from_file_location("eoe_dump_state", _MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec and spec.loader
+    sys.modules.setdefault("eoe_dump_state", module)
+    spec.loader.exec_module(module)
+    return module
+
+
+_mod = _load_dump_state_module()
+main = _mod.main
+
+
+def _create_minimal_world(world_dir: Path) -> None:
+    """Create a minimal valid world directory for testing."""
+    world_dir.mkdir(parents=True, exist_ok=True)
+    world_yml = {
+        "city": {
+            "id": "test-city",
+            "name": "Test City",
+            "districts": [
+                {
+                    "id": "core",
+                    "name": "Core District",
+                    "population": 10000,
+                }
+            ],
+        },
+        "factions": [
+            {"id": "test-faction", "name": "Test Faction"},
+        ],
+        "agents": [
+            {"id": "test-agent", "name": "Test Agent", "role": "Test"},
+        ],
+    }
+    (world_dir / "world.yml").write_text(yaml.safe_dump(world_yml), encoding="utf-8")
+
+    story_seeds = {
+        "story_seeds": [
+            {
+                "id": "test-seed",
+                "title": "Test Seed",
+                "summary": "A test story seed",
+                "stakes": "Test stakes",
+                "scope": "environment",
+                "preferred_districts": ["core"],
+                "cooldown_ticks": 10,
+                "tags": ["test"],
+                "triggers": [
+                    {
+                        "scope": "environment",
+                        "district_id": "core",
+                        "min_score": 0.5,
+                        "min_severity": 0.5,
+                    }
+                ],
+                "roles": {
+                    "agents": ["test-agent"],
+                    "factions": ["test-faction"],
+                },
+                "beats": ["Test beat"],
+                "resolution_templates": {
+                    "success": "Success",
+                    "failure": "Failure",
+                },
+                "followups": [],
+            }
+        ]
+    }
+    (world_dir / "story_seeds.yml").write_text(
+        yaml.safe_dump(story_seeds), encoding="utf-8"
+    )
+
+
+class TestMainFunction:
+    """Tests for the main CLI function."""
+
+    def test_main_loads_default_world(self, capsys) -> None:
+        """Test that main loads and displays the default world summary."""
+        # Use the actual default world in the repository
+        with patch("sys.argv", ["eoe_dump_state"]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Echoes of Emergence :: World Summary" in captured.out
+        # Check that summary fields are printed
+        assert ":" in captured.out
+
+    def test_main_loads_specified_world(self, capsys) -> None:
+        """Test loading a specified world bundle."""
+        with patch("sys.argv", ["eoe_dump_state", "--world", "default"]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Echoes of Emergence :: World Summary" in captured.out
+
+    def test_main_exports_snapshot(self, tmp_path: Path, capsys) -> None:
+        """Test exporting a snapshot to a JSON file."""
+        export_path = tmp_path / "snapshot.json"
+
+        with patch("sys.argv", ["eoe_dump_state", "-e", str(export_path)]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Snapshot written to" in captured.out
+        assert export_path.exists()
+
+        # Verify the snapshot contains valid JSON
+        data = json.loads(export_path.read_text())
+        assert "city" in data or "game_state" in data or len(data) > 0
+
+    def test_main_export_with_long_option(self, tmp_path: Path, capsys) -> None:
+        """Test exporting using --export option."""
+        export_path = tmp_path / "output" / "state.json"
+
+        with patch("sys.argv", ["eoe_dump_state", "--export", str(export_path)]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Snapshot written to" in captured.out
+        assert export_path.exists()
+
+    def test_main_with_world_short_option(self, capsys) -> None:
+        """Test using -w short option for world."""
+        with patch("sys.argv", ["eoe_dump_state", "-w", "default"]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Echoes of Emergence :: World Summary" in captured.out
+
+    def test_main_invalid_world_raises_error(self) -> None:
+        """Test that an invalid world name raises an error."""
+        with patch("sys.argv", ["eoe_dump_state", "--world", "nonexistent_world"]):
+            with pytest.raises((FileNotFoundError, ValueError)):
+                main()
+
+    def test_main_combined_options(self, tmp_path: Path, capsys) -> None:
+        """Test using both world and export options together."""
+        export_path = tmp_path / "combined.json"
+
+        with patch(
+            "sys.argv",
+            ["eoe_dump_state", "-w", "default", "-e", str(export_path)],
+        ):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Echoes of Emergence :: World Summary" in captured.out
+        assert "Snapshot written to" in captured.out
+        assert export_path.exists()
+
+
+class TestSummaryOutput:
+    """Tests for summary output content."""
+
+    def test_summary_contains_expected_fields(self, capsys) -> None:
+        """Test that summary output contains expected game state fields."""
+        with patch("sys.argv", ["eoe_dump_state"]):
+            main()
+
+        captured = capsys.readouterr()
+        # Summary should include some key information
+        # The exact fields depend on GameState.summary() implementation
+        assert ":" in captured.out
+        lines = captured.out.strip().split("\n")
+        # Should have multiple lines (header + at least one field)
+        assert len(lines) > 1
+
+
+class TestArgumentParsing:
+    """Tests for CLI argument parsing."""
+
+    def test_help_option(self, capsys) -> None:
+        """Test that --help option works."""
+        with patch("sys.argv", ["eoe_dump_state", "--help"]):
+            with pytest.raises(SystemExit) as exc_info:
+                main()
+            assert exc_info.value.code == 0
+
+        captured = capsys.readouterr()
+        assert "--world" in captured.out
+        assert "--export" in captured.out
+
+    def test_default_world_value(self) -> None:
+        """Test that default world is 'default' when not specified."""
+        # Create a new parser like the script does
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--world", "-w", default="default")
+        parser.add_argument("--export", "-e", type=Path, default=None)
+
+        args = parser.parse_args([])
+        assert args.world == "default"
+        assert args.export is None
+
+
+class TestRealWorldIntegration:
+    """Integration tests using actual repository content."""
+
+    def test_loads_repository_default_world(self, capsys) -> None:
+        """Test loading the actual default world from the repository."""
+        repo_root = Path(__file__).resolve().parents[2]
+        content_dir = repo_root / "content" / "worlds" / "default"
+
+        if not content_dir.exists():
+            pytest.skip("Default world content not found")
+
+        with patch("sys.argv", ["eoe_dump_state", "--world", "default"]):
+            main()
+
+        captured = capsys.readouterr()
+        assert "Echoes of Emergence :: World Summary" in captured.out
+
+    def test_export_creates_valid_json(self, tmp_path: Path) -> None:
+        """Test that exported snapshot is valid JSON with expected structure."""
+        export_path = tmp_path / "valid.json"
+
+        with patch("sys.argv", ["eoe_dump_state", "-e", str(export_path)]):
+            main()
+
+        assert export_path.exists()
+        data = json.loads(export_path.read_text())
+        # The snapshot should be a dictionary with game state data
+        assert isinstance(data, dict)
diff --git a/tests/scripts/test_plot_environment_trajectories.py b/tests/scripts/test_plot_environment_trajectories.py
new file mode 100644
index 00000000..47d8da7b
--- /dev/null
+++ b/tests/scripts/test_plot_environment_trajectories.py
@@ -0,0 +1,478 @@
+"""Tests for the environment trajectories plotting script."""
+
+from __future__ import annotations
+
+import json
+import sys
+from importlib import util
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+_MODULE_PATH = (
+    Path(__file__).resolve().parents[2] / "scripts" / "plot_environment_trajectories.py"
+)
+
+
+def _load_plot_module():
+    spec = util.spec_from_file_location("plot_environment_trajectories", _MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec and spec.loader
+    sys.modules.setdefault("plot_environment_trajectories", module)
+    spec.loader.exec_module(module)
+    return module
+
+
+_mod = _load_plot_module()
+parse_args = _mod.parse_args
+main = _mod.main
+_collect_runs = _mod._collect_runs
+_parse_run_spec = _mod._parse_run_spec
+_extract_series = _mod._extract_series
+DEFAULT_RUNS = _mod.DEFAULT_RUNS
+
+
+class TestParseArgs:
+    """Tests for the parse_args function."""
+
+    def test_default_values(self) -> None:
+        """Test that parse_args returns expected defaults."""
+        args = parse_args([])
+        assert args.run is None
+        assert args.output is None
+        assert args.title == "Environment trajectories"
+
+    def test_single_run_argument(self) -> None:
+        """Test parsing a single --run argument."""
+        args = parse_args(["--run", "test=path/to/file.json"])
+        assert args.run == ["test=path/to/file.json"]
+
+    def test_multiple_run_arguments(self) -> None:
+        """Test parsing multiple --run arguments."""
+        args = parse_args([
+            "--run", "run1=path1.json",
+            "--run", "run2=path2.json",
+            "--run", "run3=path3.json",
+        ])
+        assert args.run == ["run1=path1.json", "run2=path2.json", "run3=path3.json"]
+
+    def test_output_argument(self) -> None:
+        """Test parsing --output argument."""
+        args = parse_args(["--output", "/tmp/plot.png"])
+        assert args.output == Path("/tmp/plot.png")
+
+    def test_title_argument(self) -> None:
+        """Test parsing --title argument."""
+        args = parse_args(["--title", "Custom Title"])
+        assert args.title == "Custom Title"
+
+    def test_combined_arguments(self) -> None:
+        """Test parsing multiple arguments together."""
+        args = parse_args([
+            "--run", "test=file.json",
+            "--output", "out.png",
+            "--title", "My Plot",
+        ])
+        assert args.run == ["test=file.json"]
+        assert args.output == Path("out.png")
+        assert args.title == "My Plot"
+
+
+class TestParseRunSpec:
+    """Tests for the _parse_run_spec helper function."""
+
+    def test_label_equals_path_format(self) -> None:
+        """Test parsing 'label=path' format."""
+        label, path = _parse_run_spec("cushioned=build/test.json")
+        assert label == "cushioned"
+        assert path == Path("build/test.json")
+
+    def test_path_only_format(self) -> None:
+        """Test parsing path-only format (uses stem as label)."""
+        label, path = _parse_run_spec("build/test-results.json")
+        assert label == "test-results"
+        assert path == Path("build/test-results.json")
+
+    def test_label_with_spaces(self) -> None:
+        """Test parsing label with spaces around equals sign."""
+        label, path = _parse_run_spec(" my-label = path/to/file.json ")
+        assert label == "my-label"
+        assert path == Path("path/to/file.json")
+
+    def test_path_with_multiple_equals(self) -> None:
+        """Test parsing spec with multiple equals signs."""
+        label, path = _parse_run_spec("label=path/with=equals.json")
+        assert label == "label"
+        assert path == Path("path/with=equals.json")
+
+
+class TestCollectRuns:
+    """Tests for the _collect_runs function."""
+
+    def test_collect_from_run_args(self, tmp_path: Path) -> None:
+        """Test collecting runs from explicit run arguments."""
+        runs = _collect_runs(["label1=path1.json", "label2=path2.json"])
+        assert "label1" in runs
+        assert "label2" in runs
+        assert runs["label1"] == Path("path1.json")
+        assert runs["label2"] == Path("path2.json")
+
+    def test_collect_empty_when_no_args_and_no_defaults(self) -> None:
+        """Test that empty dict returned when no args and defaults don't exist."""
+        with patch.dict(_mod.__dict__, {"DEFAULT_RUNS": {}}):
+            runs = _collect_runs(None)
+            assert runs == {}
+
+    def test_collect_defaults_when_exist(self, tmp_path: Path) -> None:
+        """Test that defaults are collected when they exist."""
+        test_file = tmp_path / "test.json"
+        test_file.write_text("{}")
+
+        # Temporarily modify DEFAULT_RUNS for this test
+        original_defaults = dict(DEFAULT_RUNS)
+        DEFAULT_RUNS.clear()
+        DEFAULT_RUNS["test"] = test_file
+
+        try:
+            runs = _collect_runs(None)
+            assert "test" in runs
+            assert runs["test"] == test_file
+        finally:
+            DEFAULT_RUNS.clear()
+            DEFAULT_RUNS.update(original_defaults)
+
+    def test_collect_skips_nonexistent_defaults(self) -> None:
+        """Test that nonexistent default files are skipped."""
+        original_defaults = dict(DEFAULT_RUNS)
+        DEFAULT_RUNS.clear()
+        DEFAULT_RUNS["nonexistent"] = Path("/nonexistent/path.json")
+
+        try:
+            runs = _collect_runs(None)
+            assert "nonexistent" not in runs
+        finally:
+            DEFAULT_RUNS.clear()
+            DEFAULT_RUNS.update(original_defaults)
+
+
+class TestExtractSeries:
+    """Tests for the _extract_series function."""
+
+    def test_extract_from_director_history(self, tmp_path: Path) -> None:
+        """Test extracting series from director_history format."""
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+                {"tick": 10, "environment": {"pollution": 0.15, "unrest": 0.25}},
+                {"tick": 20, "environment": {"pollution": 0.2, "unrest": 0.3}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        assert ticks == [0, 10, 20]
+        assert pollution == [0.1, 0.15, 0.2]
+        assert unrest == [0.2, 0.25, 0.3]
+
+    def test_extract_fallback_to_last_environment(self, tmp_path: Path) -> None:
+        """Test fallback to last_environment when no director_history."""
+        data = {
+            "end_tick": 100,
+            "last_environment": {
+                "pollution": 0.5,
+                "unrest": 0.6,
+            },
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        assert ticks == [100]
+        assert pollution == [0.5]
+        assert unrest == [0.6]
+
+    def test_extract_empty_when_no_data(self, tmp_path: Path) -> None:
+        """Test that empty lists returned when no relevant data."""
+        data = {"other_field": "value"}
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        assert ticks == []
+        assert pollution == []
+        assert unrest == []
+
+    def test_extract_handles_missing_environment(self, tmp_path: Path) -> None:
+        """Test handling entries missing environment data."""
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+                {"tick": 10},  # Missing environment
+                {"tick": 20, "environment": {"pollution": 0.3, "unrest": 0.4}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        # Should skip entries without environment
+        assert ticks == [0, 20]
+        assert pollution == [0.1, 0.3]
+        assert unrest == [0.2, 0.4]
+
+    def test_extract_sorts_by_tick(self, tmp_path: Path) -> None:
+        """Test that entries are sorted by tick."""
+        data = {
+            "director_history": [
+                {"tick": 20, "environment": {"pollution": 0.3, "unrest": 0.4}},
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+                {"tick": 10, "environment": {"pollution": 0.2, "unrest": 0.3}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        assert ticks == [0, 10, 20]
+        assert pollution == [0.1, 0.2, 0.3]
+        assert unrest == [0.2, 0.3, 0.4]
+
+    def test_extract_default_values(self, tmp_path: Path) -> None:
+        """Test that missing pollution/unrest default to 0.0."""
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.0}},  # Missing unrest
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        assert ticks == [0]
+        assert pollution == [0.0]
+        assert unrest == [0.0]  # Should default to 0.0
+
+
+class TestMainFunction:
+    """Tests for the main function."""
+
+    def test_main_exits_when_no_files_found(self) -> None:
+        """Test that main exits with error when no files found."""
+        with patch.dict(_mod.__dict__, {"DEFAULT_RUNS": {}}):
+            with pytest.raises(SystemExit) as exc_info:
+                main([])
+            assert exc_info.value.code is not None
+
+    @patch("matplotlib.pyplot.show")
+    @patch("matplotlib.pyplot.subplots")
+    def test_main_creates_plot_with_run_args(
+        self, mock_subplots: MagicMock, mock_show: MagicMock, tmp_path: Path
+    ) -> None:
+        """Test that main creates a plot when run arguments provided."""
+        # Create test data file
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+                {"tick": 10, "environment": {"pollution": 0.2, "unrest": 0.3}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        # Set up mocks
+        mock_fig = MagicMock()
+        mock_ax_pollution = MagicMock()
+        mock_ax_unrest = MagicMock()
+        mock_subplots.return_value = (mock_fig, (mock_ax_pollution, mock_ax_unrest))
+
+        result = main(["--run", f"test={test_file}"])
+
+        assert result == 0
+        mock_subplots.assert_called_once()
+        mock_show.assert_called_once()
+
+    @patch("matplotlib.pyplot.subplots")
+    def test_main_saves_to_output_file(
+        self, mock_subplots: MagicMock, tmp_path: Path
+    ) -> None:
+        """Test that main saves plot to output file when specified."""
+        # Create test data file
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+                {"tick": 10, "environment": {"pollution": 0.2, "unrest": 0.3}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        output_file = tmp_path / "output" / "plot.png"
+
+        # Set up mocks
+        mock_fig = MagicMock()
+        mock_ax_pollution = MagicMock()
+        mock_ax_unrest = MagicMock()
+        mock_subplots.return_value = (mock_fig, (mock_ax_pollution, mock_ax_unrest))
+
+        result = main([
+            "--run", f"test={test_file}",
+            "--output", str(output_file),
+        ])
+
+        assert result == 0
+        mock_fig.savefig.assert_called_once()
+        # Verify parent directory was created
+        assert output_file.parent.exists()
+
+    @patch("matplotlib.pyplot.show")
+    @patch("matplotlib.pyplot.subplots")
+    def test_main_uses_custom_title(
+        self, mock_subplots: MagicMock, mock_show: MagicMock, tmp_path: Path
+    ) -> None:
+        """Test that main uses custom title when specified."""
+        # Create test data file
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        # Set up mocks
+        mock_fig = MagicMock()
+        mock_ax_pollution = MagicMock()
+        mock_ax_unrest = MagicMock()
+        mock_subplots.return_value = (mock_fig, (mock_ax_pollution, mock_ax_unrest))
+
+        result = main([
+            "--run", f"test={test_file}",
+            "--title", "Custom Plot Title",
+        ])
+
+        assert result == 0
+        mock_fig.suptitle.assert_called_with("Custom Plot Title")
+
+    @patch("matplotlib.pyplot.show")
+    @patch("matplotlib.pyplot.subplots")
+    def test_main_prints_warning_for_few_samples(
+        self, mock_subplots: MagicMock, mock_show: MagicMock, tmp_path: Path, capsys
+    ) -> None:
+        """Test that main prints warning when few samples available."""
+        # Create test data file with only one sample
+        data = {
+            "director_history": [
+                {"tick": 0, "environment": {"pollution": 0.1, "unrest": 0.2}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        # Set up mocks
+        mock_fig = MagicMock()
+        mock_ax_pollution = MagicMock()
+        mock_ax_unrest = MagicMock()
+        mock_subplots.return_value = (mock_fig, (mock_ax_pollution, mock_ax_unrest))
+
+        result = main(["--run", f"test={test_file}"])
+
+        assert result == 0
+        captured = capsys.readouterr()
+        assert "Warning" in captured.out
+        assert "1 sample" in captured.out
+
+    @patch("matplotlib.pyplot.show")
+    @patch("matplotlib.pyplot.subplots")
+    def test_main_multiple_runs(
+        self, mock_subplots: MagicMock, mock_show: MagicMock, tmp_path: Path
+    ) -> None:
+        """Test that main handles multiple run files."""
+        # Create multiple test data files
+        for i, name in enumerate(["run1", "run2", "run3"]):
+            data = {
+                "director_history": [
+                    {
+                        "tick": 0,
+                        "environment": {"pollution": 0.1 * (i + 1), "unrest": 0.2},
+                    },
+                    {
+                        "tick": 10,
+                        "environment": {"pollution": 0.2 * (i + 1), "unrest": 0.3},
+                    },
+                ]
+            }
+            test_file = tmp_path / f"{name}.json"
+            test_file.write_text(json.dumps(data))
+
+        # Set up mocks
+        mock_fig = MagicMock()
+        mock_ax_pollution = MagicMock()
+        mock_ax_unrest = MagicMock()
+        mock_subplots.return_value = (mock_fig, (mock_ax_pollution, mock_ax_unrest))
+
+        result = main([
+            "--run", f"run1={tmp_path / 'run1.json'}",
+            "--run", f"run2={tmp_path / 'run2.json'}",
+            "--run", f"run3={tmp_path / 'run3.json'}",
+        ])
+
+        assert result == 0
+        # Verify plot was called for each run
+        assert mock_ax_pollution.plot.call_count == 3
+        assert mock_ax_unrest.plot.call_count == 3
+
+
+class TestEdgeCases:
+    """Tests for edge cases and error handling."""
+
+    def test_extract_series_handles_invalid_json(self, tmp_path: Path) -> None:
+        """Test that extract_series raises on invalid JSON."""
+        test_file = tmp_path / "invalid.json"
+        test_file.write_text("not valid json")
+
+        with pytest.raises(json.JSONDecodeError):
+            _extract_series(test_file)
+
+    def test_extract_series_handles_missing_tick(self, tmp_path: Path) -> None:
+        """Test handling entries with missing tick field."""
+        data = {
+            "director_history": [
+                {"environment": {"pollution": 0.1, "unrest": 0.2}},  # Missing tick
+                {"tick": 10, "environment": {"pollution": 0.2, "unrest": 0.3}},
+            ]
+        }
+        test_file = tmp_path / "test.json"
+        test_file.write_text(json.dumps(data))
+
+        ticks, pollution, unrest = _extract_series(test_file)
+
+        # Entry with missing tick should be skipped
+        assert ticks == [10]
+        assert pollution == [0.2]
+        assert unrest == [0.3]
+
+    def test_parse_run_spec_empty_label(self) -> None:
+        """Test parsing spec with empty label."""
+        label, path = _parse_run_spec("=path/file.json")
+        assert label == ""
+        assert path == Path("path/file.json")
+
+    def test_help_displays_description(self, capsys) -> None:
+        """Test that --help displays script description."""
+        with pytest.raises(SystemExit) as exc_info:
+            parse_args(["--help"])
+        assert exc_info.value.code == 0
+
+        captured = capsys.readouterr()
+        assert (
+            "pollution" in captured.out.lower()
+            or "trajectories" in captured.out.lower()
+        )
diff --git a/tests/scripts/test_run_ai_observer.py b/tests/scripts/test_run_ai_observer.py
new file mode 100644
index 00000000..556c0661
--- /dev/null
+++ b/tests/scripts/test_run_ai_observer.py
@@ -0,0 +1,463 @@
+"""Tests for the AI Observer CLI runner script."""
+
+from __future__ import annotations
+
+import json
+import sys
+from importlib import util
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+_MODULE_PATH = (
+    Path(__file__).resolve().parents[2] / "scripts" / "run_ai_observer.py"
+)
+
+
+def _load_observer_module():
+    spec = util.spec_from_file_location("run_ai_observer", _MODULE_PATH)
+    module = util.module_from_spec(spec)
+    assert spec and spec.loader
+    sys.modules.setdefault("run_ai_observer", module)
+    spec.loader.exec_module(module)
+    return module
+
+
+_mod = _load_observer_module()
+run_ai_observer = _mod.run_ai_observer
+main = _mod.main
+
+
+class TestRunAiObserver:
+    """Tests for the run_ai_observer function."""
+
+    def test_run_with_default_parameters(self) -> None:
+        """Test running observer with default parameters."""
+        result = run_ai_observer(ticks=10)
+
+        assert "mode" in result
+        assert result["mode"].startswith("local:")
+        assert "config" in result
+        assert result["config"]["tick_budget"] == 10
+
+    def test_run_with_custom_world(self) -> None:
+        """Test running observer with specified world."""
+        result = run_ai_observer(world="default", ticks=10, analysis_interval=5)
+
+        assert result["mode"] == "local:default"
+        assert "ticks_observed" in result
+
+    def test_run_with_custom_analysis_interval(self) -> None:
+        """Test running observer with custom analysis interval."""
+        result = run_ai_observer(ticks=20, analysis_interval=5)
+
+        assert result["config"]["analysis_interval"] == 5
+
+    def test_run_with_custom_thresholds(self) -> None:
+        """Test running observer with custom threshold values."""
+        result = run_ai_observer(
+            ticks=10,
+            stability_threshold=0.3,
+            legitimacy_threshold=0.2,
+        )
+
+        assert result["config"]["stability_alert_threshold"] == 0.3
+        assert result["config"]["legitimacy_swing_threshold"] == 0.2
+
+    def test_run_writes_output_file(self, tmp_path: Path) -> None:
+        """Test that output file is written when specified."""
+        output_path = tmp_path / "output.json"
+
+        run_ai_observer(ticks=10, analysis_interval=5, output=output_path)
+
+        assert output_path.exists()
+        saved_data = json.loads(output_path.read_text())
+        assert saved_data["config"]["tick_budget"] == 10
+        assert "mode" in saved_data
+
+    def test_run_creates_output_directory(self, tmp_path: Path) -> None:
+        """Test that output directory is created if it doesn't exist."""
+        output_path = tmp_path / "nested" / "dir" / "output.json"
+
+        run_ai_observer(ticks=10, analysis_interval=5, output=output_path)
+
+        assert output_path.exists()
+
+    def test_run_returns_report_dict(self) -> None:
+        """Test that run_ai_observer returns a dictionary with expected keys."""
+        result = run_ai_observer(ticks=10)
+
+        # Check expected keys from ObservationReport.to_dict()
+        assert "ticks_observed" in result
+        assert "start_tick" in result
+        assert "end_tick" in result
+        assert "stability_trend" in result
+        assert "faction_swings" in result
+        assert "alerts" in result
+        assert "commentary" in result
+        assert "environment_summary" in result
+
+    def test_run_with_verbose_mode(self, capsys) -> None:
+        """Test running with verbose mode enabled."""
+        result = run_ai_observer(ticks=10, analysis_interval=5, verbose=True)
+
+        # Verbose mode should still return a valid result
+        assert "ticks_observed" in result
+
+
+class TestRunAiObserverServiceMode:
+    """Tests for service mode of run_ai_observer."""
+
+    def test_service_mode_creates_observer_from_service(self) -> None:
+        """Test that service mode uses create_observer_from_service."""
+        # Set up mock observer
+        mock_report = MagicMock()
+        mock_report.to_dict.return_value = {
+            "ticks_observed": 10,
+            "start_tick": 0,
+            "end_tick": 10,
+            "stability_trend": {},
+            "faction_swings": {},
+            "story_seeds_activated": [],
+            "alerts": [],
+            "commentary": [],
+            "environment_summary": {},
+        }
+        mock_observer = MagicMock()
+        mock_observer.observe.return_value = mock_report
+        mock_observer._client = None
+
+        with patch.object(
+            _mod, "create_observer_from_service", return_value=mock_observer
+        ) as mock_create_observer:
+            result = run_ai_observer(service_url="http://localhost:8000", ticks=10)
+
+            mock_create_observer.assert_called_once()
+            assert result["mode"] == "service:http://localhost:8000"
+
+    def test_service_mode_handles_connection_error(self) -> None:
+        """Test that service mode handles connection errors appropriately."""
+        mock_observer = MagicMock()
+        mock_observer.observe.side_effect = ConnectionError("Service unavailable")
+        mock_observer._client = None
+
+        with patch.object(
+            _mod, "create_observer_from_service", return_value=mock_observer
+        ):
+            with pytest.raises(ConnectionError):
+                run_ai_observer(service_url="http://localhost:8000", ticks=10)
+
+    def test_service_mode_closes_client(self) -> None:
+        """Test that service mode closes client connection."""
+        mock_report = MagicMock()
+        mock_report.to_dict.return_value = {
+            "ticks_observed": 10,
+            "start_tick": 0,
+            "end_tick": 10,
+            "stability_trend": {},
+            "faction_swings": {},
+            "story_seeds_activated": [],
+            "alerts": [],
+            "commentary": [],
+            "environment_summary": {},
+        }
+        mock_client = MagicMock()
+        mock_observer = MagicMock()
+        mock_observer.observe.return_value = mock_report
+        mock_observer._client = mock_client
+
+        with patch.object(
+            _mod, "create_observer_from_service", return_value=mock_observer
+        ):
+            run_ai_observer(service_url="http://localhost:8000", ticks=10)
+
+            mock_client.close.assert_called_once()
+
+
+class TestMainCLI:
+    """Tests for the main CLI entry point."""
+
+    def test_main_default_arguments(self, capsys) -> None:
+        """Test main with default arguments produces valid JSON output."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "ticks_observed" in data
+        assert "mode" in data
+
+    def test_main_with_world_argument(self, capsys) -> None:
+        """Test main with --world argument."""
+        exit_code = main(["--world", "default", "--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["mode"] == "local:default"
+
+    def test_main_with_short_world_argument(self, capsys) -> None:
+        """Test main with -w short argument."""
+        exit_code = main(["-w", "default", "-t", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["mode"] == "local:default"
+
+    def test_main_with_output_file(self, tmp_path: Path, capsys) -> None:
+        """Test main with output file argument."""
+        output_path = tmp_path / "result.json"
+
+        exit_code = main(["-t", "10", "-o", str(output_path)])
+
+        assert exit_code == 0
+        assert output_path.exists()
+
+        # Also printed to stdout
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "ticks_observed" in data
+
+    def test_main_with_analysis_interval(self, capsys) -> None:
+        """Test main with --analysis-interval argument."""
+        exit_code = main(["--ticks", "20", "--analysis-interval", "5"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["analysis_interval"] == 5
+
+    def test_main_with_stability_threshold(self, capsys) -> None:
+        """Test main with --stability-threshold argument."""
+        exit_code = main(["--ticks", "10", "--stability-threshold", "0.3"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["stability_alert_threshold"] == 0.3
+
+    def test_main_with_legitimacy_threshold(self, capsys) -> None:
+        """Test main with --legitimacy-threshold argument."""
+        exit_code = main(["--ticks", "10", "--legitimacy-threshold", "0.15"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["legitimacy_swing_threshold"] == 0.15
+
+    def test_main_with_verbose_flag(self, capsys) -> None:
+        """Test main with --verbose flag."""
+        exit_code = main(["--ticks", "10", "--verbose"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        # Output should still be valid JSON
+        data = json.loads(captured.out)
+        assert "ticks_observed" in data
+
+    def test_main_with_short_verbose_flag(self, capsys) -> None:
+        """Test main with -v short flag."""
+        exit_code = main(["-t", "10", "-v"])
+
+        assert exit_code == 0
+
+    def test_main_combined_arguments(self, tmp_path: Path, capsys) -> None:
+        """Test main with multiple arguments combined."""
+        output_path = tmp_path / "combined.json"
+
+        exit_code = main([
+            "--world", "default",
+            "--ticks", "10",
+            "--analysis-interval", "5",
+            "--stability-threshold", "0.4",
+            "--legitimacy-threshold", "0.15",
+            "--output", str(output_path),
+        ])
+
+        assert exit_code == 0
+        assert output_path.exists()
+
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["mode"] == "local:default"
+        assert data["config"]["tick_budget"] == 10
+        assert data["config"]["analysis_interval"] == 5
+        assert data["config"]["stability_alert_threshold"] == 0.4
+        assert data["config"]["legitimacy_swing_threshold"] == 0.15
+
+    def test_main_help_option(self, capsys) -> None:
+        """Test that --help option displays help and exits."""
+        with pytest.raises(SystemExit) as exc_info:
+            main(["--help"])
+
+        assert exc_info.value.code == 0
+        captured = capsys.readouterr()
+        assert "--world" in captured.out
+        assert "--ticks" in captured.out
+        assert "--service-url" in captured.out
+        assert "--analysis-interval" in captured.out
+        assert "--stability-threshold" in captured.out
+        assert "--legitimacy-threshold" in captured.out
+        assert "--output" in captured.out
+        assert "--verbose" in captured.out
+
+
+class TestArgumentDefaults:
+    """Tests for CLI argument default values."""
+
+    def test_default_world(self, capsys) -> None:
+        """Test that default world is 'default'."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "local:default" in data["mode"]
+
+    def test_default_ticks(self, capsys) -> None:
+        """Test that default ticks is 100."""
+        # Run with explicit ticks to avoid long test
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        # We specified 10 ticks, so config should reflect that
+        assert data["config"]["tick_budget"] == 10
+
+    def test_default_analysis_interval(self, capsys) -> None:
+        """Test that default analysis interval is 10."""
+        exit_code = main(["--ticks", "20"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["analysis_interval"] == 10
+
+    def test_default_stability_threshold(self, capsys) -> None:
+        """Test that default stability threshold is 0.5."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["stability_alert_threshold"] == 0.5
+
+    def test_default_legitimacy_threshold(self, capsys) -> None:
+        """Test that default legitimacy threshold is 0.1."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert data["config"]["legitimacy_swing_threshold"] == 0.1
+
+
+class TestReportContent:
+    """Tests for observation report content."""
+
+    def test_report_contains_stability_trend(self, capsys) -> None:
+        """Test that report contains stability trend analysis."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "stability_trend" in data
+        trend = data["stability_trend"]
+        assert "metric_name" in trend
+        assert "start_value" in trend
+        assert "end_value" in trend
+        assert "delta" in trend
+        assert "trend" in trend
+
+    def test_report_contains_faction_swings(self, capsys) -> None:
+        """Test that report contains faction swing analysis."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "faction_swings" in data
+        assert isinstance(data["faction_swings"], dict)
+
+    def test_report_contains_environment_summary(self, capsys) -> None:
+        """Test that report contains environment summary."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "environment_summary" in data
+        assert isinstance(data["environment_summary"], dict)
+
+    def test_report_contains_alerts_and_commentary(self, capsys) -> None:
+        """Test that report contains alerts and commentary."""
+        exit_code = main(["--ticks", "10"])
+
+        assert exit_code == 0
+        captured = capsys.readouterr()
+        data = json.loads(captured.out)
+        assert "alerts" in data
+        assert "commentary" in data
+        assert isinstance(data["alerts"], list)
+        assert isinstance(data["commentary"], list)
+
+
+class TestErrorHandling:
+    """Tests for error handling."""
+
+    def test_invalid_world_raises_error(self) -> None:
+        """Test that invalid world name raises an error."""
+        with pytest.raises((FileNotFoundError, ValueError)):
+            run_ai_observer(world="nonexistent_world", ticks=10)
+
+    def test_verbose_connection_error_message(self, capsys) -> None:
+        """Test that verbose mode prints connection error message."""
+        mock_observer = MagicMock()
+        mock_observer.observe.side_effect = ConnectionError("Service unavailable")
+        mock_observer._client = None
+
+        with patch.object(
+            _mod, "create_observer_from_service", return_value=mock_observer
+        ):
+            with pytest.raises(ConnectionError):
+                run_ai_observer(
+                    service_url="http://localhost:8000",
+                    ticks=10,
+                    verbose=True,
+                )
+
+            captured = capsys.readouterr()
+            assert "Connection error" in captured.out
+
+
+class TestIntegration:
+    """Integration tests with real world data."""
+
+    def test_full_observation_cycle(self) -> None:
+        """Test a complete observation cycle with real engine."""
+        result = run_ai_observer(
+            world="default",
+            ticks=20,
+            analysis_interval=10,
+        )
+
+        assert result["ticks_observed"] == 20
+        assert result["start_tick"] == 0
+        assert result["end_tick"] == 20
+
+        # Verify stability trend is reasonable
+        stability = result["stability_trend"]
+        assert 0.0 <= stability["start_value"] <= 1.0
+        assert 0.0 <= stability["end_value"] <= 1.0
+        assert stability["trend"] in ["increasing", "decreasing", "stable"]
+
+    def test_observation_with_different_tick_budgets(self) -> None:
+        """Test observation with various tick budgets."""
+        for ticks in [10, 15, 20]:
+            result = run_ai_observer(world="default", ticks=ticks, analysis_interval=5)
+            assert result["ticks_observed"] == ticks
+            assert result["config"]["tick_budget"] == ticks