Alberto-Codes · Alberto-Codes · Mar 24, 2026
diff --git a/.claude/rules/dev-quality-checklist.md b/.claude/rules/dev-quality-checklist.md
@@ -0,0 +1,26 @@
+# Dev Quality Checklist
+
+Quality patterns to apply during development, not just review.
+
+## AC-to-Test Traceability
+
+Before marking a story done, fill the AC-to-Test Mapping table in the story file. Every acceptance criterion must have at least one test that directly verifies it. Tests should be written from the AC contract, not just from the implementation.
+
+## Assertion Strength
+
+- Verify ALL relevant fields, not just the primary one
+- Use `assert_called_once_with(...)` not `assert_called_once()` — verify arguments, not just call count
+- Add negative assertions where appropriate (e.g., a passing case produces zero results)
+- Include `assert len(results) == N` to catch deduplication breakage
+
+## Edge Case Coverage
+
+- Proactively add boundary tests: exact threshold values, empty inputs, deeply nested structures
+- Test mix scenarios: some passing + some failing in same input
+- Test skip conditions: inputs that should produce zero results
+
+## Task Tracking Accuracy
+
+- Before marking a subtask `[x]` done, verify the code change actually exists
+- Run the relevant test to confirm the change is real
+- Do not mark implementation tasks done based on intent — only on verified code
diff --git a/.claude/rules/pr-review-comments.md b/.claude/rules/pr-review-comments.md
@@ -0,0 +1,85 @@
+# PR Review Comment Handling Rules
+
+When asked to review, address, or respond to PR review comments, follow this workflow exactly.
+
+## Step 1: Read the PR Description and Comments
+
+Always start by reading the PR description and all review comments together. This provides the full context for evaluating feedback.
+
+```bash
+# Get PR description (what we're trying to do)
+gh pr view <PR_NUMBER>
+
+# Get ALL review comments (inline code comments from reviewers)
+gh api repos/{owner}/{repo}/pulls/<PR_NUMBER>/comments --jq '.[] | {id, path, line, body, user: .user.login, in_reply_to_id, created_at}'
+
+# Get review summaries (top-level review verdicts)
+gh api repos/{owner}/{repo}/pulls/<PR_NUMBER>/reviews --jq '.[] | {id, state, body, user: .user.login}'
+```
+
+Use the actual owner/repo from `gh repo view --json owner,name` if needed.
+
+## Step 2: Evaluate Each Comment
+
+For each review comment, categorize it:
+
+- **Actionable**: Real issues — bugs, missing edge cases, architecture violations, security concerns, test gaps. Implement these.
+- **Style/ticky-tacky**: Cosmetic preferences that conflict with project conventions in `.claude/rules/` or add no value. Push back politely with rationale.
+- **Questions**: Clarification requests. Answer with context from the code.
+- **Suggestions**: Optional improvements. Present to the user for decision.
+
+Always evaluate against:
+1. The PR description (what problem are we solving?)
+2. Project rules in `.claude/rules/` and `CLAUDE.md`
+3. Whether the suggestion is in scope for this PR
+
+## Step 3: Present Findings to User
+
+Before making changes, present a summary:
+
+```
+## PR #X Review Comments
+
+### Will implement:
+- [comment by @reviewer] "feedback text" -> what we'll change
+
+### Pushing back on:
+- [comment by @reviewer] "feedback text" -> why (cite project convention)
+
+### Need your input:
+- [comment by @reviewer] "feedback text" -> options A vs B
+```
+
+Wait for user confirmation before proceeding.
+
+## Step 4: Make Changes
+
+After user approval, implement the agreed changes. The user will commit and push.
+
+## Step 5: Reply to Comments
+
+After changes are pushed, reply to each review comment thread.
+
+### Replying to a review comment thread
+
+A review comment has an `id`. To reply **in the same thread**, use the reply endpoint:
+
+```bash
+gh api repos/{owner}/{repo}/pulls/<PR_NUMBER>/comments/<COMMENT_ID>/replies \
+  -f body="<your reply>"
+```
+
+**IMPORTANT:** `<COMMENT_ID>` is the `id` of the **top-level** comment in the thread (the original reviewer comment), NOT `in_reply_to_id`. If a comment has `in_reply_to_id: null`, it IS the top-level comment — use its `id`. If it has `in_reply_to_id: 12345`, reply to `12345` instead.
+
+### Common mistakes to avoid
+
+- **WRONG**: `gh pr comment <PR_NUMBER> -b "..."` — this creates an issue-level comment, NOT a thread reply
+- **WRONG**: `gh api repos/{owner}/{repo}/issues/<PR_NUMBER>/comments` — this is issue comments, NOT review thread comments
+- **RIGHT**: `gh api repos/{owner}/{repo}/pulls/<PR_NUMBER>/comments/<ROOT_COMMENT_ID>/replies -f body="..."`
+
+### Reply content guidelines
+
+- Keep replies concise (1-3 sentences)
+- If implemented: "Addressed in latest push — [brief description of what changed]."
+- If pushing back: "Keeping as-is — [rationale citing project convention]."
+- If clarifying: Answer the question directly, reference code if helpful.
diff --git a/.claude/rules/pull-requests.md b/.claude/rules/pull-requests.md
@@ -2,6 +2,23 @@
 
 When creating a pull request, you MUST follow these rules exactly.
 
+## Branch Naming Convention
+
+Feature branches follow the pattern: `type/story-key`
+
+- **Pattern**: `type/epic-story-description` (e.g., `feat/2-1-scorer-protocol`)
+- **Type**: Matches conventional commit types: `feat | fix | docs | refactor | test | chore | perf`
+- **Story key**: Taken directly from sprint-status.yaml (e.g., `2-1-scorer-protocol`)
+- The branch name carries the story ID — no need to repeat it in commit messages or PR titles
+
+Examples:
+- `feat/2-1-scorer-protocol` (feature story)
+- `fix/3-2-reflection-timeout` (bug fix story)
+- `refactor/4-1-engine-extraction` (refactoring story)
+- `chore/1-1-project-scaffold` (infrastructure story)
+
+The `dev-story` workflow creates branches automatically using this convention.
+
 ## Always Draft
 
 Always create PRs as **draft** using `--draft` flag. Ready PRs trigger automated code review, so PRs must stay draft until the author explicitly marks them ready.

diff --git a/.claude/rules/pytest.md b/.claude/rules/pytest.md
@@ -0,0 +1,87 @@
+---
+globs: ["**/*test*.py", "**/test_*.py", "**/tests/**/*.py", "**/conftest.py"]
+---
+
+# Pytest Best Practices
+
+## Test File and Function Naming
+
+### File Naming Conventions
+- **MUST** use `test_*.py` or `*_test.py` pattern for pytest to discover tests
+- Place test files in dedicated `tests/` directory at project root
+- Mirror application structure in test directory
+- Example: Testing `src/gepa_adk/adapters/stoppers/signal.py` -> `tests/unit/adapters/stoppers/test_signal.py`
+
+### Test Function Naming
+- **MUST** prefix test functions with `test_`
+- Use descriptive names following pattern: `test_<what>_<condition>_<expected_result>`
+- Examples:
+  - `test_evolution_engine_stops_at_max_generations()`
+  - `test_pareto_frontier_rejects_dominated_candidates()`
+  - `test_reflection_agent_extracts_feedback()`
+- Avoid generic names like `test1()` or `test_function()`
+
+### Test Class Naming
+- **MUST** prefix test classes with `Test` (capital T)
+- Class methods **MUST** start with `test_`
+- No `__init__` method in test classes
+- Use classes to group related tests for a specific feature or module
+
+## Test Organization and Structure
+
+### Testing Pyramid
+- **MOST tests**: Unit tests (fast, isolated — test domain logic, ports, adapters in isolation)
+- **FEWER tests**: Integration tests (cross-layer wiring with real domain + engine)
+- **Markers**: `@pytest.mark.unit`, `@pytest.mark.contract`, `@pytest.mark.integration`, `@pytest.mark.api`
+
+### Organization Principles
+- Mirror application code structure in test directories
+- Separate tests by type (unit/contract/integration) using subdirectories
+- Each test should verify ONE specific aspect of code
+- Keep tests independent - no execution order dependencies
+- Tests should be fast, deterministic, and readable
+
+## Fixtures Best Practices
+
+### Fixture Scopes
+- **function** (default): Created/destroyed for each test - maximum isolation
+- **class**: Shared across all methods in a test class
+- **module**: Shared across all tests in a module - for expensive setup
+- **session**: Shared across entire test session - for very expensive operations (never for mutable fixtures)
+
+### conftest.py Usage
+- Place shared fixtures in `conftest.py` for automatic discovery (no imports needed)
+- Root-level `tests/conftest.py` for global fixtures
+- Tier-specific `conftest.py` for tier-specific fixtures
+
+### Fixture Anti-Patterns to Avoid
+- Overloaded fixtures with too many responsibilities
+- Hardcoded values (use factory pattern instead)
+- Global state leakage without proper cleanup
+- Deep fixture dependency chains
+- Duplicating fixtures across test files (use conftest.py)
+
+## Assertions and Testing Patterns
+
+### Assertion Principles
+- Use plain `assert` statements - pytest provides detailed introspection
+- Test public interfaces, not private methods or implementation details
+
+### Mocking and Patching
+- **Key Rule**: Patch where the object is USED, not where it's DEFINED
+- Use `autospec=True` to respect method signatures
+- Mock external dependencies (LLM adapters, file I/O), not internal logic
+- Prefer real objects with in-memory adapters over mocks when testing hex architecture
+
+## Quick Reference
+
+```bash
+uv run pytest                     # All tests (excludes api by default)
+uv run pytest tests/unit          # Unit tests only
+uv run pytest tests/contracts     # Contract tests only
+uv run pytest -k "pareto"        # Tests matching keyword
+uv run pytest -m "not slow"      # Exclude slow tests
+uv run pytest -m api              # Run API tests (real LLM calls)
+uv run pytest --lf                # Run last failed tests
+uv run pytest -x                  # Stop on first failure
+```
diff --git a/.claude/rules/python.md b/.claude/rules/python.md
@@ -4,6 +4,15 @@ globs: ["*.py"]
 
 # Python Conventions (gepa-adk)
 
+## Semantic Rules (not enforced by ruff/ty)
+
+- **Every file** starts with `from __future__ import annotations` — no exceptions
+- **No mutable defaults**: Never use `[]` or `{}` as function default arguments
+- **No bare exceptions**: Always catch specific exception types, include context in error messages
+- **No `assert` in production code**: `assert` is for tests only. Production code must use proper error handling
+- **f-strings for formatting**, `%`-formatting for logger calls only
+- **Comments explain WHY**, not WHAT — assume reader knows Python
+
 ## Logging
 - `logger = structlog.get_logger(__name__)` at module level — never `logging.getLogger()`
 - Event names: dot-notation for structured (`config.reflection_prompt.empty`), plain English for operational
@@ -34,6 +43,15 @@ globs: ["*.py"]
 - Section order: Summary → `Args:` → `Returns:` → `Raises:` → `Yields:`
 - Module docstrings: Summary + `Attributes:` listing `__all__` contents
 
+## Module Size Guidance
+
+When a source module exceeds ~500 lines or contains 3+ distinct concerns, consider extracting into a sub-package:
+
+- Pattern: `domain/strategy.py` -> `domain/strategy/` with `__init__.py` re-exporting public API
+- Internal modules for distinct concerns
+- Same principle applies to test files — split by concern into dedicated files
+- Apply when natural (during a story that touches the module), not as forced refactoring
+
 ## Type Checking
 - ty uses `# ty: ignore[rule]` — NOT `# type: ignore`
 - For test files, use `pyproject.toml` `[tool.ty.overrides]` instead of inline suppressions
diff --git a/.claude/rules/sonarqube.md b/.claude/rules/sonarqube.md
@@ -0,0 +1,56 @@
+# SonarQube Integration
+
+This project has a SonarQube Community Edition instance on the local LAN, connected via the official `mcp/sonarqube` MCP server (global config in `~/.claude.json`).
+
+## Project Details
+
+- **Project key**: `gepa-adk`
+- **Server**: SonarQube Community Edition on the local LAN (URL varies by machine)
+- **MCP server**: Official SonarSource `mcp/sonarqube` image, configured globally in `~/.claude.json`
+
+## Network & Environment
+
+- The SonarQube server runs on a LAN machine — not always reachable (user works across multiple machines, sometimes off-LAN)
+- If MCP tools return connection errors, the server is likely unreachable — don't retry, just note it and move on
+- When running on the machine that hosts the SonarQube server itself, the URL may be `localhost:9000` instead of the LAN IP
+- The MCP server token and URL are in `~/.claude.json` — check there for the current configuration
+- Shell profile env vars: `$SONAR_TOKEN` (scanner auth) and `$SONARQUBE_URL` (server URL) are set in `~/.bashrc`
+- **Never read tokens from `~/.claude.json`** — always use the shell env vars with `-e SONAR_TOKEN` passthrough to avoid exposing secrets in command history
+
+## Scanning (Manual)
+
+Community Edition has no branch analysis and no automatic scan triggers. Scans must be run manually after merging to main.
+
+**Project config**: `sonar-project.properties` at project root defines `sonar.projectKey`, `sonar.sources`, `sonar.python.version`, `sonar.python.coverage.reportPaths`, and `sonar.sourceEncoding`. The scanner reads this automatically — no `-D` flags needed except `sonar.host.url`.
+
+**Scan command** (run from project root on main branch):
+
+```bash
+# Optional: generate coverage report first
+uv run pytest --cov=gepa_adk --cov-report=xml
+
+# Run scanner ($SONARQUBE_URL and $SONAR_TOKEN set in ~/.bashrc)
+podman run --rm \
+  -v "$(pwd):/usr/src:z" \
+  --userns=keep-id \
+  --network host \
+  -e SONAR_TOKEN \
+  docker.io/sonarsource/sonar-scanner-cli:latest \
+  -Dsonar.host.url="$SONARQUBE_URL"
+```
+
+## When to Use
+
+- After completing code changes, use `analyze_code_snippet` to check modified files for issues
+- Use `search_sonar_issues_in_projects` with `projects: ["gepa-adk"]` to review open issues
+- Use `get_project_quality_gate_status` with `projectKey: "gepa-adk"` to check gate status
+- After a PR merges to main, offer to run a scan if the user is on-LAN
+
+## Known Issue Patterns
+
+- The dominant finding is **cognitive complexity** (`python:S3776`, threshold 15). When refactoring functions, aim to stay under 15. When using `analyze_code_snippet` for fast feedback, target CC ≤ 12 to buffer against divergence between the MCP tool and the full scanner.
+
+## Important Notes
+
+- SonarQube dashboard results lag behind local changes — don't verify fixes via the dashboard API immediately after editing; use `analyze_code_snippet` for fast feedback during development
+- The server may not be reachable in every session (off-LAN, server down)
diff --git a/.claude/rules/testing.md b/.claude/rules/testing.md
@@ -61,8 +61,11 @@ Every test file sets `pytestmark = pytest.mark.<tier>` at module level.
 
 ## Mocking
 - `pytest-mock` (`mocker` fixture) for patching — not raw `unittest.mock.patch`
+- **Key Rule**: Patch where the object is USED, not where it's DEFINED
+- Use `autospec=True` to respect method signatures
 - `AsyncMock` for async methods, `MagicMock` for sync
 - Mock at the boundary (ports), not deep internals
+- Prefer real objects with in-memory adapters over mocks when testing hex architecture
 - Accessing private attributes (e.g., `stopper._stop_requested = True`) is a last resort for signal/OS interaction tests — not a general pattern
 
 ## Async Tests

diff --git a/.github/workflows/boundaries.yml b/.github/workflows/boundaries.yml
@@ -23,6 +23,7 @@ jobs:
           enable-cache: true
       - run: uv python install 3.12
       - run: uv sync --locked --dev
+      - run: uv run lint-imports
       - run: bash scripts/check_boundaries.sh
       - run: uv run python scripts/check_protocol_coverage.py
         continue-on-error: true  # TODO: remove after Story 3.2 fills contract test gaps

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -27,6 +27,7 @@ jobs:
       - run: uv run ruff check .
       - run: uv run ruff format --check .
       - run: uv run uv-secure
+      - run: uv run lint-imports
       - run: uv cache prune --ci
         if: always()
         continue-on-error: true

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -42,6 +42,13 @@ repos:
         types: [python]
         pass_filenames: false
 
+      - id: lint-imports
+        name: import-linter
+        entry: uv run lint-imports
+        language: system
+        types: [python]
+        pass_filenames: false
+
       - id: docvet
         name: docvet check
         entry: uv run docvet check --staged