diff --git a/.github/workflows/pr-preview.yml b/.github/workflows/pr-preview.yml
index 228f2f05d..977712d6a 100644
--- a/.github/workflows/pr-preview.yml
+++ b/.github/workflows/pr-preview.yml
@@ -48,7 +48,7 @@ jobs:
           python -m pip install --upgrade pip
           pip install . || true
           pip install -r requirements/requirements-base.txt
-          pip install pytest
+          pip install pytest pytest-asyncio
 
       - name: Run unit tests
         run: python -m pytest tests/unit/ -v --tb=short
@@ -255,12 +255,39 @@ jobs:
           path: playwright-report/
           retention-days: 14
 
-      # ── Cleanup ─────────────────────────────────────────────────────────
-      - name: Cleanup smoke deployment
+      # ── Copilot SDK smoke (BYOK via local Ollama) ─────────────────────
+      # NOTE: The Copilot SDK requires GitHub auth even in BYOK mode.
+      # This step validates build + boot but react_smoke will time out
+      # until CI has a Copilot-authenticated token.  Non-fatal for now.
+      - name: Cleanup CMSCompOps deployment before Copilot smoke
         if: ${{ always() }}
         run: |
           yes | archi delete --name ci-${{ github.run_id }} || true
 
+      - name: Run Copilot SDK smoke deployment
+        continue-on-error: true
+        uses: ./.github/actions/run-smoke
+        with:
+          deployment-name: ci-copilot-${{ github.run_id }}
+          config-path: tests/pr_preview_config/pr_preview_copilot_config.yaml
+          config-destination: configs/ci/ci_copilot_config.generated.yaml
+          services: chatbot
+          hostmode: "true"
+          wait-url: http://localhost:2786/api/health
+          base-url: http://localhost:2786
+          extra-env: |
+            ARCHI_COMPOSE_UP_FLAGS=--build --force-recreate
+            SMOKE_OLLAMA_MODEL=qwen3:4b
+            SMOKE_OLLAMA_URL=http://localhost:11434
+            SMOKE_OLLAMA_HOST=http://localhost:11434
+          use-podman: "false"
+
+      # ── Cleanup ─────────────────────────────────────────────────────────
+      - name: Cleanup smoke deployments
+        if: ${{ always() }}
+        run: |
+          yes | archi delete --name ci-copilot-${{ github.run_id }} || true
+
       - name: Cleanup local base images
         if: ${{ always() && needs.build-base-images.outputs.changed == 'true' }}
         run: |
diff --git a/.gitignore b/.gitignore
index 8386cf17a..ec0ffcf0c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -37,3 +37,4 @@ git/
 local_files/
 raw_local_files/
 websites/
+.env_tmp_smoke
diff --git a/docs/multi-backend-agent-recommendation.md b/docs/multi-backend-agent-recommendation.md
new file mode 100644
index 000000000..1710b4f23
--- /dev/null
+++ b/docs/multi-backend-agent-recommendation.md
@@ -0,0 +1,89 @@
+# Multi-Backend Agent Abstraction: Recommendation
+
+**Date:** March 25, 2026
+**Question:** Should A2rchi support a general agent backend (Copilot SDK, Claude Agent SDK, LangChain) or lock into the Copilot SDK?
+
+**Verdict: Lock into Copilot SDK. A general abstraction is feasible but not advisable.**
+
+## Side-by-Side Comparison
+
+| Dimension | Copilot SDK | Claude Agent SDK | LangChain |
+|---|---|---|---|
+| **Runtime** | CLI subprocess (`copilot --headless`) | CLI subprocess (`claude` CLI) | In-process graph |
+| **Tool definition** | `defineTool(name, {description, parameters: JSONSchema, handler})` | `@tool(name, desc, schema)` → must return MCP `{"content": [...]}` | `@tool` decorator, returns `str` |
+| **Streaming** | Event callbacks: `session.on("event_type", handler)` | Async iterator: `async for msg in query()` | State generator: `for chunk in agent.stream()` |
+| **Session** | `createSession()` → `sendAndWait()`, managed by CLI | `query()` (stateless) or `ClaudeSDKClient` (sessioned), managed by CLI | `invoke(state)`, state is external (you manage it) |
+| **Models** | GPT-4.1 default, BYOK for OpenAI/Azure/Anthropic/Google/Mistral | Claude only, BYOK via Bedrock/Vertex/Azure AI Foundry | Any provider via `init_chat_model()` |
+| **Hooks** | `onPreToolUse`, `onPostToolUse`, session lifecycle | `PreToolUse`, `PostToolUse`, `PermissionRequest`, etc. | Middleware: `@before_model`, `@after_model`, `@wrap_tool_call` |
+| **Auth** | GitHub OAuth, env vars, BYOK | Anthropic API key, Bedrock, Vertex | Per-model provider keys |
+
+## Key Issues With a General Abstraction
+
+### 1. Tool return format mismatch
+
+Claude Agent SDK enforces MCP wire format — tools must return `{"content": [{"type": "text", "text": "..."}]}`. Copilot tools return any serializable value. LangChain tools return strings. Every tool needs a per-backend wrapper that normalizes both input schemas and return formats. Our 7 tools become 21 adapter functions.
+
+### 2. Both Copilot and Claude SDKs are CLI wrappers
+
+They spawn a subprocess and communicate over stdio/TCP. LangChain runs fully in-process. This means:
+
+- Two separate CLI binaries in your Docker image
+- Two different auth flows (GitHub OAuth vs Anthropic API key)
+- Two different process lifecycle managers
+- LangChain requires none of this (but has completely different plumbing)
+
+### 3. Three incompatible streaming models
+
+Our existing `copilot_event_adapter.py` is ~400 lines that translate Copilot's event callbacks into `PipelineOutput` objects. We'd need an equivalent adapter for each backend — each handling different event types, different data shapes, different async patterns (callbacks vs async iterators vs sync generators).
+
+### 4. Claude Agent SDK BYOK is provider-level, not model-level
+
+The Claude Agent SDK does support BYOK via Amazon Bedrock, Google Vertex AI, and Microsoft Azure AI Foundry. But this means "bring your own cloud credentials to access **Claude models**" — not "bring your own key to use any model." You're still restricted to Claude (Sonnet, Opus, Haiku). Copilot SDK's BYOK lets you swap between entirely different model families (GPT-4.1, Claude, Gemini, Mistral). A2rchi's multi-provider model selection would not work through the Claude Agent SDK.
+
+### 5. Session lifecycle is fundamentally different
+
+Copilot and Claude manage sessions inside their CLI process (persist, resume, fork). LangChain has no built-in session — you provide state via checkpointers. Abstracting over "session" means accepting the lowest common denominator: no resume, no persistence, no fork.
+
+### 6. LCD strips unique value from each SDK
+
+- **Copilot:** Custom agents, skills, system message section overrides (replace/remove/append per section) — can't express through an abstraction
+- **Claude:** Permission system, sandbox, file checkpointing, subagents — not available in others
+- **LangChain:** Middleware pipeline, dynamic model selection, structured output strategies — completely different paradigm
+
+## The Math
+
+Each additional backend requires:
+
+| Component | LOC |
+|---|---|
+| Event/streaming adapter | ~400 |
+| Tool wrappers (7 tools × format normalization) | ~200 |
+| Session lifecycle management | ~300 |
+| Auth/config integration | ~150 |
+| **Total per backend** | **~1,050** |
+
+Plus ongoing maintenance when any SDK ships breaking changes.
+
+## Why the Architecture Already Supports a Future Pivot
+
+The current architecture is already well-separated:
+
+- **`archi.py`** is 100% backend-agnostic — it calls `pipeline.stream()` and validates `PipelineOutput`
+- The pipeline factory (`getattr(archiPipelines, class_name)`) lets you add a `LangChainAgentPipeline` or `ClaudeAgentPipeline` as a new pipeline class without touching any shared code
+- **`PipelineOutput`** is the universal contract — any new backend just needs to yield these
+
+No premature abstraction layer needed. When the time comes, you add a new pipeline class.
+
+## If You Ever Need a Second Backend
+
+**LangChain is the better addition** (not Claude Agent SDK) because:
+
+1. It runs in-process (no CLI dependency)
+2. It supports any model provider
+3. Its `@tool` decorator is closest to Copilot's `defineTool`
+
+But even then, it's ~1,000+ LOC of glue code for marginal value — the same users who want "Anthropic models" already get them through Copilot SDK's BYOK.
+
+## Recommendation
+
+Stay on the Copilot SDK. Build the second backend only when a concrete use case demands it — the architecture is ready.
diff --git a/examples/agents/cms-comp-ops.md b/examples/agents/cms-comp-ops.md
index 0b4718608..e24fd4640 100644
--- a/examples/agents/cms-comp-ops.md
+++ b/examples/agents/cms-comp-ops.md
@@ -4,6 +4,8 @@ tools:
   - search_vectorstore_hybrid
   - search_local_files
   - search_metadata_index
+  - list_metadata_schema
+  - fetch_catalog_document
 ---
 
 You are the CMS Comp Ops assistant. You help with operational questions, troubleshooting,
diff --git a/pyproject.toml b/pyproject.toml
index f5136f334..259b8b9e2 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -2,7 +2,7 @@
 name = "archi"
 version = "1.2.4"
 description = "An AI Augmented Research Chat Intelligence (archi)"
-requires-python = ">=3.7"
+requires-python = ">=3.10"
 authors = [
     {name="Pietro Lugato", email="pmlugato@mit.edu"},
     {name="Julius Heitkoetter", email="juliush@mit.edu"},
@@ -14,7 +14,7 @@ authors = [
 ]
 dependencies = [
     "pyyaml==6.0.1",
-    "click==8.1.7",
+    "click>=8.1.7",
     "jinja2==3.1.3",
     "requests==2.31.0",
     "podman-compose==1.4.0",
@@ -48,6 +48,7 @@ build-backend = "setuptools.build_meta"
 [tool.pytest.ini_options]
 testpaths = ["tests/unit"]
 addopts = "-v --tb=short"
+asyncio_mode = "auto"
 
 [project.urls]
 "Homepage" = "https://github.com/archi-physics/archi"
diff --git a/requirements/requirements-base.txt b/requirements/requirements-base.txt
index 602eed1dd..05d7e4a99 100644
--- a/requirements/requirements-base.txt
+++ b/requirements/requirements-base.txt
@@ -18,6 +18,7 @@ httptools==0.6.1
 httpx==0.27.2
 humanfriendly==10.0
 croniter==2.0.5
+github-copilot-sdk>=0.2.0
 langgraph==1.0.2
 langchain-mcp-adapters==0.1.11
 langchain==1.0.3
diff --git a/src/archi/pipelines/__init__.py b/src/archi/pipelines/__init__.py
index 3a04015b8..547540949 100644
--- a/src/archi/pipelines/__init__.py
+++ b/src/archi/pipelines/__init__.py
@@ -1,11 +1,12 @@
 """Pipeline package exposing the available pipeline classes."""
 
+from .agents.base_react import BaseReActAgent
+from .agents.cms_comp_ops_agent import CMSCompOpsAgent
 from .classic_pipelines.base import BasePipeline
 from .classic_pipelines.grading import GradingPipeline
 from .classic_pipelines.image_processing import ImageProcessingPipeline
 from .classic_pipelines.qa import QAPipeline
-from .agents.base_react import BaseReActAgent
-from .agents.cms_comp_ops_agent import CMSCompOpsAgent
+from .copilot_agents.copilot_agent import CopilotAgentPipeline
 
 __all__ = [
     "BasePipeline",
@@ -14,4 +15,5 @@
     "QAPipeline",
     "BaseReActAgent",
     "CMSCompOpsAgent",
+    "CopilotAgentPipeline",
 ]
diff --git a/src/archi/pipelines/agents/base_react.py b/src/archi/pipelines/agents/base_react.py
index 43e08f604..5f42bfb1b 100644
--- a/src/archi/pipelines/agents/base_react.py
+++ b/src/archi/pipelines/agents/base_react.py
@@ -19,7 +19,7 @@
 from src.archi.providers.base import ProviderType
 from src.archi.utils.output_dataclass import PipelineOutput
 from src.archi.pipelines.agents.utils.run_memory import RunMemory
-from src.archi.pipelines.agents.utils.mcp_utils import AsyncLoopThread
+from src.archi.utils.async_loop import AsyncLoopThread
 from src.archi.pipelines.agents.tools import initialize_mcp_client
 from src.utils.logging import get_logger
 
@@ -79,6 +79,10 @@ def create_run_memory(self) -> RunMemory:
         """Instantiate a fresh run memory for an agent run."""
         return RunMemory()
 
+    def supports_persisted_session_id(self) -> bool:
+        """Classic ReAct agents are stateless beyond the provided history."""
+        return False
+
     def start_run_memory(self) -> RunMemory:
         """Create and store the active memory for the current run."""
         memory = self.create_run_memory()
diff --git a/src/archi/pipelines/agents/tools/local_files.py b/src/archi/pipelines/agents/tools/local_files.py
index 9190bb5e2..0508ee9ac 100644
--- a/src/archi/pipelines/agents/tools/local_files.py
+++ b/src/archi/pipelines/agents/tools/local_files.py
@@ -108,7 +108,13 @@ def search(
             params=params,
             headers=self._headers,
             timeout=self.timeout,
+            allow_redirects=False,
         )
+        if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
+            raise RuntimeError(
+                f"Catalog API redirected to {resp.headers.get('Location', '?')} — "
+                "check DM_API_TOKEN or data_manager auth config"
+            )
         resp.raise_for_status()
         data = resp.json()
         return data.get("hits", []) or []
@@ -119,9 +125,15 @@ def get_document(self, resource_hash: str, *, max_chars: int = 4000) -> Optional
             params={"max_chars": max_chars},
             headers=self._headers,
             timeout=self.timeout,
+            allow_redirects=False,
         )
         if resp.status_code == 404:
             return None
+        if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
+            raise RuntimeError(
+                f"Catalog API redirected to {resp.headers.get('Location', '?')} \u2014 "
+                "check DM_API_TOKEN or data_manager auth config"
+            )
         resp.raise_for_status()
         return resp.json()
 
@@ -130,7 +142,13 @@ def schema(self) -> Dict[str, object]:
             f"{self.base_url}/api/catalog/schema",
             headers=self._headers,
             timeout=self.timeout,
+            allow_redirects=False,
         )
+        if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
+            raise RuntimeError(
+                f"Catalog API redirected to {resp.headers.get('Location', '?')} \u2014 "
+                "check DM_API_TOKEN or data_manager auth config"
+            )
         resp.raise_for_status()
         return resp.json()
 
diff --git a/src/archi/pipelines/agents/tools/retriever.py b/src/archi/pipelines/agents/tools/retriever.py
index 149aaf902..c99bd5c21 100644
--- a/src/archi/pipelines/agents/tools/retriever.py
+++ b/src/archi/pipelines/agents/tools/retriever.py
@@ -62,7 +62,7 @@ def _format_documents_for_llm(
 def create_retriever_tool(
     retriever: BaseRetriever,
     *,
-    name: str = "search_knowledge_base",
+    name: str = "search_vectorstore_hybrid",
     description: Optional[str] = None,
     max_documents: int = 4,
     max_chars: int = 800,
diff --git a/src/archi/pipelines/copilot_agents/__init__.py b/src/archi/pipelines/copilot_agents/__init__.py
new file mode 100644
index 000000000..7967324d4
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/__init__.py
@@ -0,0 +1,5 @@
+"""Copilot SDK agent package."""
+
+from .copilot_agent import CopilotAgentPipeline
+
+__all__ = ["CopilotAgentPipeline"]
diff --git a/src/archi/pipelines/copilot_agents/copilot_agent.py b/src/archi/pipelines/copilot_agents/copilot_agent.py
new file mode 100644
index 000000000..e5c50c5bc
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/copilot_agent.py
@@ -0,0 +1,770 @@
+"""CopilotAgentPipeline — agent pipeline powered by the GitHub Copilot SDK.
+
+Replaces ``BaseReActAgent`` / ``CMSCompOpsAgent`` with a single pipeline
+class that creates per-request Copilot SDK sessions.  Streaming events are
+translated to ``PipelineOutput`` by :class:`CopilotEventAdapter`.
+
+Design decisions implemented here:
+  1  — One CopilotClient at init, per-request sessions, AsyncLoopThread bridge
+  1b — invoke()/stream()/astream() signatures match BaseReActAgent
+  3  — Event adapter maps SDK events → PipelineOutput
+  4  — BYOK-first provider mapping
+  8  — MCP config passthrough (archi.mcp_servers → SDK mcpServers)
+  13 — Context management delegated to Copilot CLI infinite sessions
+  17 — get_tool_registry()/get_tool_descriptions() from TOOL_REGISTRY
+  SP — Session persistence via resume_session() (stored pipeline_session_id)
+  CM — System message customize mode (keep SDK safety defaults)
+  EH — onErrorOccurred hook for auto-retry and friendly errors
+  ET — Tool lifecycle via streaming events (native toolCallId)
+"""
+
+from __future__ import annotations
+
+from typing import (Any, AsyncIterator, Callable, Dict, Iterator, List,
+                    Optional, Sequence, Tuple)
+
+from src.archi.pipelines.copilot_agents.copilot_event_adapter import CopilotEventAdapter
+from src.archi.utils.async_loop import AsyncLoopThread
+from src.archi.utils.output_dataclass import PipelineOutput
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+def _get_copilot_client_cls():
+    """Lazy import so units that don't have the SDK installed can still
+    import this module for ``get_tool_registry()`` / ``get_tool_descriptions()``."""
+    from copilot import CopilotClient
+
+    return CopilotClient
+
+
+def _build_tool_restriction_kwargs(custom_tools: list) -> Dict[str, list[str]]:
+    """Return Copilot session tool restrictions.
+
+    Uses ``available_tools`` as an **allowlist** containing only the names of
+    our custom Archi tools.  This blocks every SDK built-in tool (bash, edit,
+    grep, sql, report_intent, etc.) without needing to enumerate them.
+
+    The SDK docs state that ``available_tools`` **takes precedence** over
+    ``excluded_tools``.  An empty list means "allow nothing".  When Archi has
+    no custom tools, we pass an empty list — which correctly disables all tools.
+    """
+    allowed = [t.name for t in custom_tools]
+    return {
+        "available_tools": allowed,
+    }
+
+
+# ── Provider mapping (decision 4) ────────────────────────────────────────
+
+_PROVIDER_TYPE_MAP = {
+    "openai": "openai",
+    "anthropic": "anthropic",
+    "openrouter": "openai",  # OpenRouter is OpenAI-compatible
+    "local": "openai",  # Ollama / vLLM expose an OpenAI-compatible API
+}
+
+
+def _build_sdk_provider(
+    provider_name: str,
+    model_id: str,
+    providers_config: dict,
+    *,
+    api_key: Optional[str] = None,
+) -> dict:
+    """Translate A2rchi provider config → Copilot SDK ``provider`` dict.
+
+    Parameters
+    ----------
+    provider_name:
+        One of ``"openai"``, ``"anthropic"``, ``"openrouter"``, ``"local"``.
+    model_id:
+        The model identifier (e.g. ``"gpt-4o"``, ``"claude-sonnet-4-20250514"``).
+    providers_config:
+        ``services.chat_app.providers`` config section.
+    api_key:
+        Optional per-user BYOK key.  Falls back to the provider's env var.
+    """
+    sdk_type = _PROVIDER_TYPE_MAP.get(provider_name.lower())
+    if sdk_type is None:
+        raise ValueError(
+            f"Provider '{provider_name}' cannot be mapped to a Copilot SDK "
+            f"BYOK provider.  Supported: {list(_PROVIDER_TYPE_MAP)}."
+        )
+
+    provider_cfg = providers_config.get(provider_name.lower(), {})
+    result: Dict[str, Any] = {"type": sdk_type}
+
+    base_url = provider_cfg.get("base_url")
+    if not base_url:
+        # Default base URLs for known OpenAI-compatible providers
+        _DEFAULT_BASE_URLS = {
+            "openrouter": "https://openrouter.ai/api/v1",
+        }
+        base_url = _DEFAULT_BASE_URLS.get(provider_name.lower())
+    if base_url:
+        # The Copilot SDK uses OpenAI-compatible endpoints (/chat/completions)
+        # directly under base_url.  Ollama (and similar local servers) serve
+        # that API under /v1, so append it when missing.
+        if provider_name.lower() == "local" and not base_url.rstrip("/").endswith(
+            "/v1"
+        ):
+            base_url = base_url.rstrip("/") + "/v1"
+        result["base_url"] = base_url
+
+    if api_key:
+        result["api_key"] = api_key
+    else:
+        # Fallback: let the provider resolve from env
+        from src.utils.env import read_secret
+
+        env_map = {
+            "openai": "OPENAI_API_KEY",
+            "anthropic": "ANTHROPIC_API_KEY",
+            "openrouter": "OPENROUTER_API_KEY",
+        }
+        env_var = env_map.get(provider_name.lower())
+        if env_var:
+            key = read_secret(env_var)
+            if key:
+                result["api_key"] = key
+
+    return result
+
+
+# ── MCP config mapping (decision 8 — task 3.5) ──────────────────────────
+
+
+def _build_mcp_servers(archi_config: dict) -> Optional[dict]:
+    """Map ``archi.mcp_servers`` to the SDK's ``mcpServers`` format.
+
+    Existing A2rchi format::
+
+        mcp_servers:
+          my_server:
+            transport: "stdio"
+            command: "uvx"
+            args: ["mcp-server-example"]
+          web_search:
+            transport: "sse"
+            url: "http://localhost:8080/sse"
+
+    SDK format::
+
+        mcpServers:
+          my_server:
+            type: "stdio"
+            command: "uvx"
+            args: ["mcp-server-example"]
+          web_search:
+            type: "sse"
+            url: "http://localhost:8080/sse"
+    """
+    raw = archi_config.get("mcp_servers")
+    if not raw:
+        return None
+
+    result = {}
+    for name, cfg in raw.items():
+        entry = dict(cfg)
+        # Rename 'transport' → 'type' for SDK
+        transport = entry.pop("transport", None)
+        if transport:
+            entry["type"] = transport
+        result[name] = entry
+    return result or None
+
+
+# ══════════════════════════════════════════════════════════════════════════
+#  CopilotAgentPipeline
+# ══════════════════════════════════════════════════════════════════════════
+
+
+class CopilotAgentPipeline:
+    """Agent pipeline backed by the GitHub Copilot SDK.
+
+    The pipeline is instantiated once at startup (via ``archi.update()``).
+    Each ``stream()`` / ``invoke()`` call creates a short-lived SDK session
+    with the appropriate provider, tools, and system message.
+    """
+
+    def __init__(
+        self,
+        config: Dict[str, Any],
+        *args,
+        agent_spec: Optional[Any] = None,
+        default_provider: Optional[str] = None,
+        default_model: Optional[str] = None,
+        **kwargs,
+    ) -> None:
+        self.config = config
+        self.archi_config = config.get("archi") or {}
+        self.dm_config = config.get("data_manager", {})
+
+        self.agent_spec = agent_spec
+        self.default_provider = default_provider
+        self.default_model = default_model
+
+        # Resolve selected tool names from agent spec
+        self.selected_tool_names: List[str] = []
+        if agent_spec is not None:
+            self.selected_tool_names = list(getattr(agent_spec, "tools", []) or [])
+
+        # Read prompt from agent spec or pipeline config
+        self.agent_prompt: Optional[str] = None
+        if agent_spec is not None:
+            self.agent_prompt = getattr(agent_spec, "prompt", None)
+
+        # Providers config (for BYOK mapping)
+        services_cfg = config.get("services", {})
+        chat_cfg = (
+            services_cfg.get("chat_app", {}) if isinstance(services_cfg, dict) else {}
+        )
+        self._providers_config = (
+            chat_cfg.get("providers", {}) if isinstance(chat_cfg, dict) else {}
+        )
+
+        # Shared async loop
+        self._async_loop = AsyncLoopThread.get_instance()
+
+        # Copilot Client — one per pipeline instance (decision 1)
+        self._client = _get_copilot_client_cls()()
+
+        # Optional: catalog client and MONIT client (lazy)
+        self._catalog_client = None
+        self._monit_client = None
+        self._rucio_events_skill = None
+        self._init_optional_services()
+
+    def _init_optional_services(self) -> None:
+        """Initialise optional service clients (catalog, MONIT)."""
+        # Catalog client for file/metadata tools
+        try:
+            from src.archi.pipelines.agents.tools import RemoteCatalogClient
+
+            self._catalog_client = RemoteCatalogClient.from_deployment_config(
+                self.config
+            )
+        except Exception:
+            logger.debug("Catalog client not available", exc_info=True)
+
+        # MONIT OpenSearch client
+        from src.utils.env import read_secret
+
+        monit_token = read_secret("MONIT_GRAFANA_TOKEN")
+        chat_cfg = self.config.get("services", {}).get("chat_app", {})
+        monit_url = chat_cfg.get("tools", {}).get("monit", {}).get("url")
+        if monit_token and monit_url:
+            try:
+                from src.archi.pipelines.agents.tools import \
+                    MONITOpenSearchClient
+
+                self._monit_client = MONITOpenSearchClient(
+                    url=monit_url, token=monit_token
+                )
+                from src.archi.pipelines.agents.utils.skill_utils import \
+                    load_skill
+
+                self._rucio_events_skill = load_skill("rucio_events", self.config)
+                logger.info("MONIT OpenSearch client initialised")
+            except Exception:
+                logger.debug("MONIT client init failed", exc_info=True)
+
+    # ── Tool construction ─────────────────────────────────────────────
+
+    def _build_tools(
+        self,
+        collector,
+        vectorstore: Any = None,
+    ) -> list:
+        """Build the list of ``@define_tool`` functions for a session.
+
+        Only tools listed in ``self.selected_tool_names`` are built.
+        If the list is empty all available tools are built.
+        """
+        from src.archi.pipelines.copilot_agents.tools.file_search import (
+            build_document_fetch_tool, build_file_search_tool,
+            build_metadata_schema_tool, build_metadata_search_tool)
+        from src.archi.pipelines.copilot_agents.tools.monit_search import (
+            build_monit_aggregation_tool, build_monit_search_tool)
+
+        store_docs = collector.make_store_docs_callback()
+        tools: list = []
+
+        names: Optional[set] = None
+        if self.selected_tool_names:
+            names = set(self.selected_tool_names)
+
+        def _want(name: str) -> bool:
+            return names is None or name in names
+
+        # Vectorstore retriever tool
+        if vectorstore and _want("search_vectorstore_hybrid"):
+            try:
+                from src.archi.pipelines.copilot_agents.tools.retriever import build_retriever_tool
+                from src.data_manager.vectorstore.retrievers import \
+                    HybridRetriever
+
+                retrievers_cfg = self.dm_config.get("retrievers", {})
+                hybrid_cfg = retrievers_cfg.get("hybrid_retriever", {})
+                k = hybrid_cfg.get("num_documents_to_retrieve", 5)
+                bm25_weight = hybrid_cfg.get("bm25_weight", 0.6)
+                semantic_weight = hybrid_cfg.get("semantic_weight", 0.4)
+                retriever = HybridRetriever(
+                    vectorstore=vectorstore,
+                    k=k,
+                    bm25_weight=bm25_weight,
+                    semantic_weight=semantic_weight,
+                )
+                tools.append(build_retriever_tool(retriever, store_docs=store_docs))
+            except Exception:
+                logger.warning("Could not build retriever tool", exc_info=True)
+
+        # Catalog tools
+        if self._catalog_client:
+            if _want("search_local_files"):
+                tools.append(
+                    build_file_search_tool(
+                        self._catalog_client,
+                        store_docs=store_docs,
+                    )
+                )
+            if _want("search_metadata_index"):
+                tools.append(
+                    build_metadata_search_tool(
+                        self._catalog_client,
+                        store_docs=store_docs,
+                    )
+                )
+            if _want("list_metadata_schema"):
+                tools.append(build_metadata_schema_tool(self._catalog_client))
+            if _want("fetch_catalog_document"):
+                tools.append(build_document_fetch_tool(self._catalog_client))
+
+        # MONIT tools
+        if self._monit_client:
+            monit_index = "monit_prod_cms_rucio_raw_events*"
+            if _want("monit_opensearch_search"):
+                tools.append(
+                    build_monit_search_tool(
+                        self._monit_client,
+                        index=monit_index,
+                        skill=self._rucio_events_skill,
+                    )
+                )
+            if _want("monit_opensearch_aggregation"):
+                tools.append(
+                    build_monit_aggregation_tool(
+                        self._monit_client,
+                        index=monit_index,
+                        skill=self._rucio_events_skill,
+                    )
+                )
+
+        return tools
+
+    # ── Session creation ──────────────────────────────────────────────
+
+    def _build_session_config(
+        self,
+        *,
+        api_key: Optional[str] = None,
+        tools: list,
+        provider_override: Optional[str] = None,
+        model_override: Optional[str] = None,
+    ) -> dict:
+        """Assemble the session config dict for ``client.create_session()``.
+
+        Combines:
+          - System message (customize mode — keep SDK defaults)
+          - Provider (BYOK)
+          - MCP servers
+          - Tools
+        """
+        cfg: Dict[str, Any] = {}
+
+        # System message (customize mode — decision CM)
+        if self.agent_prompt:
+            cfg["system_message"] = {
+                "mode": "customize",
+                "sections": {
+                    "identity": {
+                        "action": "replace",
+                        "content": self.agent_prompt,
+                    },
+                },
+            }
+
+        # Provider (decision 4) — per-request overrides take precedence
+        effective_provider = provider_override or self.default_provider
+        effective_model = model_override or self.default_model
+        if effective_provider and effective_model:
+            cfg["provider"] = _build_sdk_provider(
+                effective_provider,
+                effective_model,
+                self._providers_config,
+                api_key=api_key,
+            )
+            cfg["model"] = effective_model
+
+        # MCP servers (decision 8)
+        mcp = _build_mcp_servers(self.archi_config)
+        if mcp:
+            cfg["mcp_servers"] = mcp
+
+        # Tools are passed to create_session, not in config dict
+        cfg["_tools"] = tools
+
+        return cfg
+
+    async def _create_session(
+        self,
+        adapter: CopilotEventAdapter,
+        config: dict,
+        *,
+        session_id: Optional[str] = None,
+    ) -> Tuple[Any, bool]:
+        """Create or resume a Copilot SDK session with hooks attached.
+
+        Returns
+        -------
+        (session, was_resumed) : tuple
+            The SDK session and whether it was resumed from a prior session_id.
+        """
+        tools = config.pop("_tools", [])
+
+        hooks = {
+            "on_error_occurred": self._on_error_occurred,
+        }
+        tool_restrictions = _build_tool_restriction_kwargs(tools)
+
+        if session_id:
+            # Resume existing session — SDK manages conversation history
+            try:
+                session = await self._client.resume_session(
+                    session_id,
+                    tools=tools,
+                    on_permission_request=self._on_permission_request,
+                    streaming=True,
+                    hooks=hooks,
+                    **tool_restrictions,
+                    **config,
+                )
+                logger.debug("Resumed session %s", session_id)
+                return session, True
+            except Exception:
+                logger.info(
+                    "Could not resume session %s — creating new",
+                    session_id,
+                    exc_info=True,
+                )
+                # Don't reuse a failed session_id for the new session
+                session_id = None
+
+        # Create a new session
+        create_kwargs: Dict[str, Any] = dict(
+            tools=tools,
+            on_permission_request=self._on_permission_request,
+            streaming=True,
+            hooks=hooks,
+            **tool_restrictions,
+            **config,
+        )
+        if session_id:
+            create_kwargs["session_id"] = session_id
+
+        # Log provider type/model without leaking API keys
+        provider_info = config.get("provider", "default")
+        if isinstance(provider_info, dict):
+            provider_info = {k: v for k, v in provider_info.items() if k != "api_key"}
+        logger.info(
+            "Creating Copilot session with %d tools, restrictions=%s, provider=%s, model=%s",
+            len(tools),
+            tool_restrictions,
+            provider_info,
+            config.get("model", "default"),
+        )
+
+        session = await self._client.create_session(**create_kwargs)
+        return session, False
+
+    # ── Error hook (decision EH) ──────────────────────────────────────
+
+    def _on_error_occurred(self, hook_input, context=None):
+        """Handle SDK errors — retry transient model errors, log all."""
+        error = (
+            hook_input.get("error", "")
+            if isinstance(hook_input, dict)
+            else getattr(hook_input, "error", "")
+        )
+        error_context = (
+            hook_input.get("errorContext", "")
+            if isinstance(hook_input, dict)
+            else getattr(hook_input, "errorContext", "")
+        )
+        recoverable = (
+            hook_input.get("recoverable", False)
+            if isinstance(hook_input, dict)
+            else getattr(hook_input, "recoverable", False)
+        )
+
+        logger.error(
+            "Copilot SDK error: context=%s recoverable=%s error=%s",
+            error_context,
+            recoverable,
+            error,
+        )
+
+        if recoverable and error_context == "model_call":
+            return {
+                "errorHandling": "retry",
+                "retryCount": 2,
+                "userNotification": "Model request failed, retrying...",
+            }
+
+        # Non-recoverable: let SDK handle it (session.error event will fire)
+        return None
+
+    def _allowed_custom_tool_names(self) -> set[str]:
+        """Return the set of Archi custom tools the active agent is allowed to run."""
+        from src.archi.pipelines.copilot_agents.tools import TOOL_REGISTRY
+
+        if not self.selected_tool_names:
+            return set(TOOL_REGISTRY.keys())
+        return set(self.selected_tool_names)
+
+    def _on_permission_request(self, request, invocation):
+        """Allow only declared Archi custom tools and deny SDK built-ins."""
+        from copilot.generated.session_events import PermissionRequestKind
+        from copilot.types import PermissionRequestResult
+
+        kind = getattr(request, "kind", None)
+        tool_name = getattr(request, "tool_name", "") or ""
+        command_text = getattr(request, "full_command_text", None)
+
+        if kind == PermissionRequestKind.CUSTOM_TOOL:
+            if tool_name in self._allowed_custom_tool_names():
+                return PermissionRequestResult(kind="approved")
+            logger.warning(
+                "Denied custom tool permission request: tool=%s invocation=%s",
+                tool_name or "<missing>",
+                invocation,
+            )
+            return PermissionRequestResult(
+                kind="denied",
+                message=f"Tool '{tool_name or 'unknown'}' is not allowed in this Archi agent.",
+            )
+
+        logger.warning(
+            "Denied non-custom permission request: kind=%s tool=%s command=%s invocation=%s",
+            getattr(kind, "value", kind),
+            tool_name or "<none>",
+            command_text or "<none>",
+            invocation,
+        )
+        return PermissionRequestResult(
+            kind="denied",
+            message="Only Archi custom tools are allowed in this deployment.",
+        )
+
+    # ── Public API ────────────────────────────────────────────────────
+
+    def stream(self, **kwargs) -> Iterator[PipelineOutput]:
+        """Stream agent events as ``PipelineOutput`` objects.
+
+        Accepted kwargs: ``history``, ``conversation_id``,
+        ``pipeline_session_id``, ``vectorstore``, ``user_id`` (for BYOK
+        resolution), ``provider``, ``model``, ``provider_api_key``
+        (per-request overrides from settings UI).
+        """
+        history = kwargs.get("history")
+        conversation_id = kwargs.get("conversation_id")
+        vectorstore = kwargs.get("vectorstore")
+        user_id = kwargs.get("user_id")
+        provider_override = kwargs.get("provider")
+        model_override = kwargs.get("model")
+        session_api_key = kwargs.get("provider_api_key")
+
+        # Per-request document collector
+        from src.archi.pipelines.copilot_agents.tools import DocumentCollector
+
+        collector = DocumentCollector()
+
+        # Build tools for this request
+        tools = self._build_tools(collector, vectorstore=vectorstore)
+        logger.info(
+            "Built %d tools for session: %s",
+            len(tools),
+            [getattr(t, "name", getattr(t, "__name__", str(t))) for t in tools],
+        )
+
+        # Resolve BYOK key: session-provided key takes precedence over DB key
+        api_key = session_api_key or self._resolve_byok_key(user_id)
+
+        # Session config
+        session_config = self._build_session_config(
+            api_key=api_key,
+            tools=tools,
+            provider_override=provider_override,
+            model_override=model_override,
+        )
+
+        # Resume only when chat metadata has a real Copilot SDK session ID.
+        session_id = kwargs.get("pipeline_session_id")
+
+        # Adapter bridges async SDK → sync generator
+        adapter = CopilotEventAdapter(self._async_loop)
+        active_session_id: Optional[str] = None
+
+        # Create session and start consuming events (async)
+        async def _run_session():
+            nonlocal active_session_id
+            try:
+                session, was_resumed = await self._create_session(
+                    adapter,
+                    session_config,
+                    session_id=session_id,
+                )
+                active_session_id = getattr(session, "session_id", None)
+
+                # Build the prompt.  The SDK session is stateful so when
+                # resumed it already knows prior turns.  For a *new* session
+                # with prior history we prepend earlier turns so the model
+                # has full context.
+                last_msg = ""
+                if history:
+                    last_pair = history[-1]
+                    if last_pair[0].lower() in ("user", "human"):
+                        last_msg = last_pair[1]
+
+                    # Prepend earlier turns when there are >1 history pairs
+                    # and the session was freshly created (not resumed).
+                    if len(history) > 1 and not was_resumed:
+                        prior = []
+                        for role, content in history[:-1]:
+                            label = (
+                                "User"
+                                if role.lower() in ("user", "human")
+                                else "Assistant"
+                            )
+                            prior.append(f"{label}: {content}")
+                        prefix = "\n".join(prior)
+                        last_msg = (
+                            f"[Prior conversation context]\n{prefix}\n"
+                            f"[End of prior context]\n\n{last_msg}"
+                        )
+
+                # Register event handler and send the user's message
+                adapter.attach_to_session(session)
+                await session.send_and_wait(last_msg, timeout=120.0)
+            except Exception as exc:
+                logger.error("Copilot session error: %s", exc, exc_info=True)
+                adapter._queue.put(
+                    PipelineOutput(
+                        answer="",
+                        metadata={"event_type": "error", "error": str(exc)},
+                        final=False,
+                    )
+                )
+            finally:
+                adapter.signal_done()
+
+        # Schedule async work on the background loop
+        import concurrent.futures
+
+        future = self._async_loop.run_no_wait(_run_session())
+
+        # Yield events from the sync iterator
+        try:
+            for output in adapter.iter_outputs():
+                yield output
+        finally:
+            # Wait for async work to finish
+            try:
+                future.result(timeout=5.0)
+            except Exception:
+                logger.debug("Session future cleanup error", exc_info=True)
+
+        # Yield the final output with source documents
+        final = adapter.build_final_output(
+            source_documents=collector.unique_documents(),
+            retriever_scores=collector.scores(),
+        )
+        if active_session_id:
+            final.metadata["pipeline_session_id"] = active_session_id
+        yield final
+
+    def supports_persisted_session_id(self) -> bool:
+        """Copilot sessions can be resumed using a persisted SDK session ID."""
+        return True
+
+    def invoke(self, **kwargs) -> PipelineOutput:
+        """Run the agent and return the final ``PipelineOutput``.
+
+        Consumes ``stream()`` internally (decision 1b).
+        """
+        last_output = None
+        for output in self.stream(**kwargs):
+            last_output = output
+        if last_output is None:
+            return PipelineOutput(answer="", final=True)
+        return last_output
+
+    async def astream(self, **kwargs) -> AsyncIterator[PipelineOutput]:
+        """Async streaming — wraps the sync stream in an executor.
+
+        For true async callers.  The underlying SDK is async but the
+        adapter uses a queue bridge, so this is a convenience wrapper.
+        """
+        import asyncio
+
+        loop = asyncio.get_event_loop()
+
+        q: "asyncio.Queue[Optional[PipelineOutput]]" = asyncio.Queue()
+
+        def _pump():
+            try:
+                for output in self.stream(**kwargs):
+                    loop.call_soon_threadsafe(q.put_nowait, output)
+            finally:
+                loop.call_soon_threadsafe(q.put_nowait, None)
+
+        executor_task = loop.run_in_executor(None, _pump)
+
+        while True:
+            item = await q.get()
+            if item is None:
+                break
+            yield item
+
+        await executor_task
+
+    # ── BYOK resolution ──────────────────────────────────────────────
+
+    def _resolve_byok_key(self, user_id: Optional[str]) -> Optional[str]:
+        """Resolve a BYOK API key for the current provider and user."""
+        if not user_id or not self.default_provider:
+            return None
+        try:
+            from src.archi.providers.byok_resolver import get_byok_resolver
+
+            resolver = get_byok_resolver()
+            return resolver.get_byok_key(self.default_provider, user_id)
+        except Exception:
+            logger.debug("BYOK resolution failed", exc_info=True)
+            return None
+
+    # ── Tool registry (decision 17) ──────────────────────────────────
+
+    def get_tool_registry(self) -> Dict[str, Callable]:
+        """Return tool name -> factory mapping for the agent spec editor."""
+        from src.archi.pipelines.copilot_agents.tools import TOOL_REGISTRY
+
+        return {name: entry["factory"] for name, entry in TOOL_REGISTRY.items()}
+
+    def get_tool_descriptions(self) -> Dict[str, str]:
+        """Return tool name -> description mapping for UI display."""
+        from src.archi.pipelines.copilot_agents.tools import TOOL_REGISTRY
+
+        return {name: entry["description"] for name, entry in TOOL_REGISTRY.items()}
diff --git a/src/archi/pipelines/copilot_agents/copilot_event_adapter.py b/src/archi/pipelines/copilot_agents/copilot_event_adapter.py
new file mode 100644
index 000000000..09c645129
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/copilot_event_adapter.py
@@ -0,0 +1,422 @@
+"""Translate Copilot SDK session events into PipelineOutput objects.
+
+The adapter subscribes to a Copilot SDK session's async event stream and
+pushes ``PipelineOutput`` objects into a thread-safe ``queue.Queue``.  A
+synchronous generator (``iter_outputs()``) drains the queue on the Flask
+thread so ChatWrapper.stream() can yield them unchanged.
+
+Key behaviours (see design.md decisions 3, 14, 18, 20):
+
+* **Text accumulation** — SDK ``message_delta`` events are accumulated
+  into ``_response_buffer``; each yielded PipelineOutput contains the
+  full accumulated text (``accumulated: true`` contract).
+* **Thinking state machine** — SDK ``reasoning_delta`` events have no
+  explicit start/end signals.  The adapter tracks ``_in_thinking`` and
+  emits paired ``thinking_start`` / ``thinking_end`` events with
+  matching ``step_id``.
+* **Tool lifecycle via streaming events** — Tool start/complete events
+  come through ``tool.execution_start`` / ``tool.execution_complete``
+  streaming events which carry a native ``toolCallId`` for
+  deterministic correlation.
+* **Cancellation cleanup** — ``iter_outputs()``'s ``finally`` block
+  calls ``session.disconnect()`` via the async loop (decision 18).
+* **Usage metadata** — Populated from the SDK session's idle /
+  final event (decision 20).
+"""
+
+from __future__ import annotations
+
+import queue
+import time
+import uuid
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from typing import Any, Dict, Iterator, List, Optional
+
+from src.archi.utils.async_loop import AsyncLoopThread
+from src.archi.utils.output_dataclass import PipelineOutput
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+_SENTINEL = object()  # Signals end-of-stream to the queue consumer
+
+
+@dataclass
+class _ToolCallRecord:
+    """Track a single tool invocation for metadata storage (decision 12)."""
+
+    id: str
+    name: str
+    args: Dict[str, Any]
+    result: Optional[str] = None
+    created_at: str = field(
+        default_factory=lambda: datetime.now(timezone.utc).isoformat()
+    )
+    _start_time: float = field(default_factory=time.time, repr=False)
+
+
+class CopilotEventAdapter:
+    """Bridges async Copilot SDK events to sync PipelineOutput iteration.
+
+    Lifecycle::
+
+        adapter = CopilotEventAdapter(async_loop)
+        # Then call adapter.consume_session(session) from the async loop.
+        for output in adapter.iter_outputs():
+            yield output  # PipelineOutput
+    """
+
+    def __init__(self, async_loop: AsyncLoopThread) -> None:
+        self._async_loop = async_loop
+        self._queue: queue.Queue = queue.Queue()
+
+        # Text accumulation (decision 14)
+        self._response_buffer: str = ""
+
+        # Thinking state machine (decision 3)
+        self._in_thinking: bool = False
+        self._thinking_step_id: Optional[str] = None
+        self._thinking_start_time: Optional[float] = None
+        self._thinking_buffer: str = ""
+
+        # Tool tracking (decision 12)
+        self._tool_calls: List[_ToolCallRecord] = []
+        self._active_tools: Dict[str, _ToolCallRecord] = {}
+
+        # Usage metadata (decision 20)
+        self._usage: Optional[Dict[str, Any]] = None
+
+        # Session reference for cleanup
+        self._session: Any = None
+
+        # Cancellation flag
+        self._cancelled: bool = False
+
+    # ── Event-based session consumer ─────────────────────────────────
+
+    def attach_to_session(self, session) -> None:
+        """Register an event handler on the session via ``session.on()``.
+
+        Events are dispatched by the SDK; this method returns immediately.
+        Call ``signal_done()`` after ``send_and_wait()`` returns to push
+        the sentinel and unblock ``iter_outputs()``.
+        """
+        self._session = session
+
+        def _on_event(event):
+            if self._cancelled:
+                return
+
+            # Compare by value string for compatibility with both real and
+            # mock SessionEventType enums.
+            raw_type = event.type
+            event_type = raw_type.value if hasattr(raw_type, "value") else str(raw_type)
+            data = event.data
+
+            if event_type in ("assistant.streaming_delta", "assistant.message_delta"):
+                delta = getattr(data, "delta_content", "") or ""
+                if delta:
+                    self._end_thinking_if_active()
+                    self._response_buffer += delta
+                    self._queue.put(
+                        PipelineOutput(
+                            answer=self._response_buffer,
+                            metadata={"event_type": "text"},
+                            final=False,
+                        )
+                    )
+
+            elif event_type == "assistant.reasoning_delta":
+                delta = (
+                    getattr(data, "delta_content", "")
+                    or getattr(data, "reasoning_text", "")
+                    or ""
+                )
+                if delta:
+                    if not self._in_thinking:
+                        self._start_thinking()
+                    self._thinking_buffer += delta
+
+            elif event_type == "assistant.message":
+                content = getattr(data, "content", "") or ""
+                if content:
+                    self._end_thinking_if_active()
+                    self._response_buffer = content
+                    self._queue.put(
+                        PipelineOutput(
+                            answer=self._response_buffer,
+                            metadata={"event_type": "text"},
+                            final=False,
+                        )
+                    )
+
+            elif event_type == "assistant.reasoning":
+                content = (
+                    getattr(data, "content", "")
+                    or getattr(data, "reasoning_text", "")
+                    or ""
+                )
+                if content:
+                    self._thinking_buffer = content
+                self._end_thinking_if_active()
+
+            elif event_type == "assistant.turn_end":
+                self._end_thinking_if_active()
+
+            elif event_type == "session.idle":
+                self._end_thinking_if_active()
+
+            elif event_type == "assistant.usage":
+                self._capture_usage(data)
+
+            elif event_type == "session.error":
+                error_msg = getattr(data, "message", "") or ""
+                logger.error("Copilot SDK session error: %s", error_msg)
+
+            # ── Tool lifecycle via streaming events ───────────────────
+            elif event_type == "tool.execution_start":
+                tool_call_id = getattr(data, "tool_call_id", "") or ""
+                tool_name = (
+                    getattr(data, "tool_name", "")
+                    or getattr(data, "name", "")
+                    or "unknown"
+                )
+                tool_args = getattr(data, "arguments", {}) or {}
+
+                record = _ToolCallRecord(
+                    id=tool_call_id, name=tool_name, args=tool_args
+                )
+                self._active_tools[tool_call_id] = record
+                self._tool_calls.append(record)
+
+                self._end_thinking_if_active()
+
+                self._queue.put(
+                    PipelineOutput(
+                        answer="",
+                        metadata={
+                            "event_type": "tool_start",
+                            "tool_call_id": tool_call_id,
+                            "tool_name": tool_name,
+                            "tool_args": tool_args,
+                        },
+                        final=False,
+                    )
+                )
+
+            elif event_type == "tool.execution_complete":
+                tool_call_id = getattr(data, "tool_call_id", "") or ""
+                result_obj = getattr(data, "result", None)
+                result_str = str(result_obj) if result_obj is not None else ""
+
+                matched_record = self._active_tools.pop(tool_call_id, None)
+                if matched_record is not None:
+                    matched_record.result = result_str
+                else:
+                    logger.warning(
+                        "tool.execution_complete for unknown tool_call_id=%s — no matching start event",
+                        tool_call_id,
+                    )
+
+                duration_ms = (
+                    int((time.time() - matched_record._start_time) * 1000)
+                    if matched_record is not None
+                    else None
+                )
+
+                self._queue.put(
+                    PipelineOutput(
+                        answer="",
+                        metadata={
+                            "event_type": "tool_output",
+                            "tool_call_id": tool_call_id,
+                            "output": result_str,
+                        },
+                        final=False,
+                    )
+                )
+                self._queue.put(
+                    PipelineOutput(
+                        answer="",
+                        metadata={
+                            "event_type": "tool_end",
+                            "tool_call_id": tool_call_id,
+                            "status": "success",
+                            "duration_ms": duration_ms,
+                        },
+                        final=False,
+                    )
+                )
+
+        session.on(_on_event)
+
+    def signal_done(self) -> None:
+        """Push the sentinel to unblock ``iter_outputs()``.
+
+        Called after ``send_and_wait()`` completes.
+        """
+        self._end_thinking_if_active()
+        if self._usage is None:
+            logger.warning(
+                "No usage data received from SDK — token counts will be missing from trace"
+            )
+        self._queue.put(_SENTINEL)
+
+    # ── Sync generator (consumed by Flask thread) ─────────────────────
+
+    def iter_outputs(self, *, poll_timeout: float = 180.0) -> Iterator[PipelineOutput]:
+        """Yield PipelineOutput objects until the session stream ends.
+
+        On GeneratorExit (stream cancelled), disconnects the SDK session.
+        Uses a poll timeout to prevent indefinite blocking if the async
+        session crashes without calling ``signal_done()``.
+        """
+        try:
+            while True:
+                try:
+                    item = self._queue.get(timeout=poll_timeout)
+                except queue.Empty:
+                    logger.warning(
+                        "Adapter queue timed out after %.0fs — session may have crashed",
+                        poll_timeout,
+                    )
+                    break
+                if item is _SENTINEL:
+                    break
+                yield item
+        except GeneratorExit:
+            self._cancelled = True
+            raise
+        finally:
+            self._cancelled = True
+            if self._session is not None:
+                try:
+                    self._async_loop.run(self._session.disconnect(), timeout=5.0)
+                except Exception:
+                    logger.debug(
+                        "Error disconnecting session in cleanup", exc_info=True
+                    )
+
+    def build_final_output(
+        self,
+        *,
+        source_documents: Optional[list] = None,
+        retriever_scores: Optional[list] = None,
+    ) -> PipelineOutput:
+        """Build the terminal PipelineOutput with accumulated state.
+
+        Called after ``iter_outputs()`` is exhausted, before the final
+        event is yielded to ChatWrapper.
+        """
+        metadata: Dict[str, Any] = {"event_type": "final"}
+
+        if self._usage is not None:
+            metadata["usage"] = self._usage
+
+        if self._tool_calls:
+            metadata["tool_calls"] = [
+                {
+                    "id": tc.id,
+                    "name": tc.name,
+                    "args": tc.args,
+                    "result": tc.result or "",
+                    "created_at": tc.created_at,
+                }
+                for tc in self._tool_calls
+            ]
+
+        if retriever_scores:
+            metadata["retriever_scores"] = retriever_scores
+
+        return PipelineOutput(
+            answer=self._response_buffer,
+            source_documents=source_documents or [],
+            metadata=metadata,
+            final=True,
+        )
+
+    # ── Internal helpers ──────────────────────────────────────────────
+
+    def _start_thinking(self) -> None:
+        self._in_thinking = True
+        self._thinking_step_id = str(uuid.uuid4())
+        self._thinking_start_time = time.time()
+        self._thinking_buffer = ""
+        self._queue.put(
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "thinking_start",
+                    "step_id": self._thinking_step_id,
+                },
+                final=False,
+            )
+        )
+
+    def _end_thinking_if_active(self) -> None:
+        if not self._in_thinking:
+            return
+        duration_ms = (
+            int((time.time() - self._thinking_start_time) * 1000)
+            if self._thinking_start_time
+            else 0
+        )
+        self._queue.put(
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "thinking_end",
+                    "step_id": self._thinking_step_id or "",
+                    "duration_ms": duration_ms,
+                    "thinking_content": self._thinking_buffer,
+                },
+                final=False,
+            )
+        )
+        self._in_thinking = False
+        self._thinking_step_id = None
+        self._thinking_start_time = None
+        self._thinking_buffer = ""
+
+    def _capture_usage(self, usage) -> None:
+        """Normalize SDK usage/data object and accumulate across events.
+
+        The SDK fires one ``assistant.usage`` event per API call.  When the
+        model invokes built-in or custom tools the session may make several
+        API calls, so we *accumulate* token counts rather than overwriting.
+        """
+        if isinstance(usage, dict):
+            raw = usage
+        else:
+            # SDK Data object from ASSISTANT_USAGE event
+            input_tokens = (
+                getattr(usage, "input_tokens", None)
+                or getattr(usage, "prompt_tokens", None)
+                or 0
+            )
+            output_tokens = (
+                getattr(usage, "output_tokens", None)
+                or getattr(usage, "completion_tokens", None)
+                or 0
+            )
+            raw = {
+                "prompt_tokens": input_tokens,
+                "completion_tokens": output_tokens,
+                "total_tokens": (input_tokens or 0) + (output_tokens or 0),
+            }
+
+        if self._usage is None:
+            self._usage = {
+                "prompt_tokens": 0,
+                "completion_tokens": 0,
+                "total_tokens": 0,
+                "context_window": None,
+            }
+
+        self._usage["prompt_tokens"] += raw.get("prompt_tokens", 0)
+        self._usage["completion_tokens"] += raw.get("completion_tokens", 0)
+        self._usage["total_tokens"] += raw.get("total_tokens", 0)
+        # context_window is a fixed property — take the latest reported value
+        if raw.get("context_window") is not None:
+            self._usage["context_window"] = raw["context_window"]
diff --git a/src/archi/pipelines/copilot_agents/tools/__init__.py b/src/archi/pipelines/copilot_agents/tools/__init__.py
new file mode 100644
index 000000000..aba793737
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/tools/__init__.py
@@ -0,0 +1,67 @@
+"""Copilot SDK tool factories and the central TOOL_REGISTRY.
+
+The registry maps canonical tool names to ``{"factory": callable, "description": str}``
+entries.  ``CopilotAgentPipeline.get_tool_registry()`` reads this mapping so that
+the agent spec editor can display available tools and their descriptions.
+"""
+
+from .document_collector import DocumentCollector
+from .file_search import (DOCUMENT_FETCH_DESCRIPTION, DOCUMENT_FETCH_NAME,
+                          FILE_SEARCH_DESCRIPTION, FILE_SEARCH_NAME,
+                          METADATA_SCHEMA_DESCRIPTION, METADATA_SCHEMA_NAME,
+                          METADATA_SEARCH_DESCRIPTION, METADATA_SEARCH_NAME,
+                          build_document_fetch_tool, build_file_search_tool,
+                          build_metadata_schema_tool,
+                          build_metadata_search_tool)
+from .monit_search import (AGGREGATION_TOOL_DESCRIPTION, AGGREGATION_TOOL_NAME,
+                           SEARCH_TOOL_DESCRIPTION, SEARCH_TOOL_NAME,
+                           build_monit_aggregation_tool,
+                           build_monit_search_tool)
+from .retriever import TOOL_DESCRIPTION as RETRIEVER_DESCRIPTION
+from .retriever import TOOL_NAME as RETRIEVER_NAME
+from .retriever import build_retriever_tool
+
+# Central tool registry: name → {factory, description}.
+# Each factory is a callable(**deps) → @define_tool-decorated function.
+TOOL_REGISTRY = {
+    RETRIEVER_NAME: {
+        "factory": build_retriever_tool,
+        "description": RETRIEVER_DESCRIPTION,
+    },
+    FILE_SEARCH_NAME: {
+        "factory": build_file_search_tool,
+        "description": FILE_SEARCH_DESCRIPTION,
+    },
+    METADATA_SEARCH_NAME: {
+        "factory": build_metadata_search_tool,
+        "description": METADATA_SEARCH_DESCRIPTION,
+    },
+    METADATA_SCHEMA_NAME: {
+        "factory": build_metadata_schema_tool,
+        "description": METADATA_SCHEMA_DESCRIPTION,
+    },
+    DOCUMENT_FETCH_NAME: {
+        "factory": build_document_fetch_tool,
+        "description": DOCUMENT_FETCH_DESCRIPTION,
+    },
+    SEARCH_TOOL_NAME: {
+        "factory": build_monit_search_tool,
+        "description": SEARCH_TOOL_DESCRIPTION,
+    },
+    AGGREGATION_TOOL_NAME: {
+        "factory": build_monit_aggregation_tool,
+        "description": AGGREGATION_TOOL_DESCRIPTION,
+    },
+}
+
+__all__ = [
+    "TOOL_REGISTRY",
+    "DocumentCollector",
+    "build_retriever_tool",
+    "build_file_search_tool",
+    "build_metadata_search_tool",
+    "build_metadata_schema_tool",
+    "build_document_fetch_tool",
+    "build_monit_search_tool",
+    "build_monit_aggregation_tool",
+]
diff --git a/src/archi/pipelines/copilot_agents/tools/document_collector.py b/src/archi/pipelines/copilot_agents/tools/document_collector.py
new file mode 100644
index 000000000..fe2430130
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/tools/document_collector.py
@@ -0,0 +1,61 @@
+"""Per-request document collector for source attribution.
+
+Provides a ``store_docs`` callback that tools call to record retrieved
+documents, plus an ``on_post_tool_use`` hook handler for MCP / built-in
+tools whose output might contain document references.
+
+After a request completes, the pipeline reads ``unique_documents()`` and
+``scores()`` to populate the final PipelineOutput (decision 5, 11).
+"""
+
+from __future__ import annotations
+
+from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple
+
+from langchain_core.documents import Document
+
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+class DocumentCollector:
+    """Accumulates documents retrieved by tools during a single request."""
+
+    def __init__(self) -> None:
+        self._docs: List[Tuple[str, Document]] = []  # (tool_label, doc)
+        self._scores: List[float] = []
+
+    def store_docs(self, tool_label: str, docs: Sequence[Document]) -> None:
+        """Callback passed to tool factories as ``store_docs``."""
+        for doc in docs:
+            self._docs.append((tool_label, doc))
+
+    def store_docs_with_scores(
+        self,
+        tool_label: str,
+        docs: Sequence[Document],
+        scores: Optional[Sequence[float]] = None,
+    ) -> None:
+        for i, doc in enumerate(docs):
+            self._docs.append((tool_label, doc))
+            if scores and i < len(scores):
+                self._scores.append(scores[i])
+
+    def unique_documents(self) -> List[Document]:
+        """De-duplicate by page_content hash, preserving insertion order."""
+        seen: set = set()
+        result: List[Document] = []
+        for _, doc in self._docs:
+            key = hash(doc.page_content)
+            if key not in seen:
+                seen.add(key)
+                result.append(doc)
+        return result
+
+    def scores(self) -> List[float]:
+        return list(self._scores)
+
+    def make_store_docs_callback(self) -> Callable[[str, Sequence[Document]], None]:
+        """Return a bound callback suitable for tool factory ``store_docs`` param."""
+        return self.store_docs
diff --git a/src/archi/pipelines/copilot_agents/tools/file_search.py b/src/archi/pipelines/copilot_agents/tools/file_search.py
new file mode 100644
index 000000000..0aea66fce
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/tools/file_search.py
@@ -0,0 +1,285 @@
+"""Catalog-backed file search, metadata search, and document fetch tools.
+
+Migrated from ``src.archi.pipelines.agents.tools.local_files`` to use
+the Copilot SDK ``@define_tool`` decorator with Pydantic input models.
+
+The ``RemoteCatalogClient`` is unchanged — it's imported from the
+original module to avoid duplicating HTTP client code.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Callable, Dict, List, Optional, Sequence, Tuple
+
+from langchain_core.documents import Document
+from pydantic import BaseModel, Field
+
+from src.archi.pipelines.agents.tools.local_files import (
+    RemoteCatalogClient, _format_files_for_llm, _format_grep_hits,
+    _render_metadata_preview)
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+# ── Pydantic input models ────────────────────────────────────────────────
+
+
+class FileSearchInput(BaseModel):
+    query: str = Field(description="Search query string.")
+    regex: bool = Field(default=False, description="Treat query as regex.")
+    case_sensitive: bool = Field(default=False, description="Case-sensitive search.")
+    max_results_override: Optional[int] = Field(
+        default=None, description="Override default max results."
+    )
+    max_matches_per_file: int = Field(default=3, description="Max matches per file.")
+    before: int = Field(default=0, description="Context lines before match.")
+    after: int = Field(default=0, description="Context lines after match.")
+
+
+class MetadataSearchInput(BaseModel):
+    query: str = Field(description="Metadata query with optional key:value filters.")
+
+
+class MetadataSchemaInput(BaseModel):
+    pass  # No input required
+
+
+class DocumentFetchInput(BaseModel):
+    resource_hash: str = Field(description="Resource hash from a previous search hit.")
+    max_chars: int = Field(default=4000, description="Max characters of document text.")
+
+
+# ── Tool metadata for registry ───────────────────────────────────────────
+
+FILE_SEARCH_NAME = "search_local_files"
+FILE_SEARCH_DESCRIPTION = (
+    "Grep-like search over local document contents only (not filenames/paths).\n"
+    "Input: query (string), regex=false, case_sensitive=false, max_results_override=None, "
+    "max_matches_per_file=3, before=0, after=0.\n"
+    "Output: lines grouped by file with hash/path and matching line numbers, plus context lines.\n"
+    'Example input: "timeout error" (regex=false).'
+)
+
+METADATA_SEARCH_NAME = "search_metadata_index"
+METADATA_SEARCH_DESCRIPTION = (
+    "Search document metadata stored in PostgreSQL (tickets, git, local files).\n"
+    "Input: query string with key:value filters; filters are exact matches and ANDed "
+    "within a group, OR across groups.\n"
+    "Output: list of matches with hash, path, metadata, and a short snippet."
+)
+
+METADATA_SCHEMA_NAME = "list_metadata_schema"
+METADATA_SCHEMA_DESCRIPTION = (
+    "Return metadata schema hints: supported keys, distinct source_type values, and suffixes. "
+    "Use this to learn which key:value filters are available before searching."
+)
+
+DOCUMENT_FETCH_NAME = "fetch_catalog_document"
+DOCUMENT_FETCH_DESCRIPTION = (
+    "Fetch a catalog document by resource hash after a search hit.\n"
+    "Input: resource_hash (string), max_chars=4000.\n"
+    "Output: path, metadata, and document text (truncated).\n"
+    'Example input: "abcd1234".'
+)
+
+
+# ── Factory functions ────────────────────────────────────────────────────
+
+
+def build_file_search_tool(
+    catalog: RemoteCatalogClient,
+    *,
+    name: str = FILE_SEARCH_NAME,
+    description: Optional[str] = None,
+    max_results: int = 3,
+    store_docs: Optional[Callable[[str, Sequence[Document]], None]] = None,
+):
+    from copilot import define_tool
+
+    tool_description = description or FILE_SEARCH_DESCRIPTION
+
+    @define_tool(name=name, description=tool_description)
+    async def _search_local_files(params: FileSearchInput) -> str:
+        query = params.query
+        regex = params.regex
+        case_sensitive = params.case_sensitive
+        max_results_override = params.max_results_override
+        max_matches_per_file = params.max_matches_per_file
+        before = params.before
+        after = params.after
+        if not query.strip():
+            return "Please provide a non-empty search query."
+
+        limit = max_results_override or max_results
+        try:
+            results = catalog.search(
+                query.strip(),
+                limit=limit,
+                search_content=True,
+                regex=regex,
+                case_sensitive=case_sensitive,
+                max_matches_per_file=max_matches_per_file,
+                before=before,
+                after=after,
+                mode="grep",
+            )
+        except Exception as exc:
+            logger.warning("Catalog search failed: %s", exc)
+            return "Catalog search failed."
+
+        hits: List[Dict] = list(results)
+        docs: List[Document] = []
+
+        if store_docs and hits:
+            for item in hits:
+                try:
+                    resource_hash = item.get("hash")
+                    doc_payload = (
+                        catalog.get_document(resource_hash, max_chars=4000) or {}
+                    )
+                    text = doc_payload.get("text") or ""
+                    doc_meta = doc_payload.get("metadata") or item.get("metadata") or {}
+                    docs.append(Document(page_content=text, metadata=doc_meta))
+                except Exception:
+                    continue
+
+        if store_docs:
+            store_docs(f"{name}: {query}", docs)
+
+        return _format_grep_hits(hits)
+
+    return _search_local_files
+
+
+def build_metadata_search_tool(
+    catalog: RemoteCatalogClient,
+    *,
+    name: str = METADATA_SEARCH_NAME,
+    description: Optional[str] = None,
+    max_results: int = 5,
+    store_docs: Optional[Callable[[str, Sequence[Document]], None]] = None,
+):
+    from copilot import define_tool
+
+    tool_description = description or METADATA_SEARCH_DESCRIPTION
+
+    @define_tool(name=name, description=tool_description)
+    async def _search_metadata(params: MetadataSearchInput) -> str:
+        query = params.query
+        if not query.strip():
+            return "Please provide a non-empty search query."
+
+        hits: List[Tuple[str, Path, Optional[Dict], str]] = []
+        docs: List[Document] = []
+
+        try:
+            results = catalog.search(
+                query.strip(), limit=max_results, search_content=False
+            )
+        except Exception as exc:
+            logger.warning("Metadata search failed: %s", exc)
+            return "Metadata search failed."
+
+        for item in results:
+            resource_hash = item.get("hash")
+            path = Path(item.get("path", "")) if item.get("path") else Path("")
+            metadata = (
+                item.get("metadata") if isinstance(item.get("metadata"), dict) else {}
+            )
+            snippet = item.get("snippet") or ""
+            hits.append((resource_hash, path, metadata, snippet))
+            if len(hits) >= max_results:
+                break
+
+        if store_docs and hits:
+            for resource_hash, path, metadata, _ in hits:
+                try:
+                    doc_payload = (
+                        catalog.get_document(resource_hash, max_chars=4000) or {}
+                    )
+                    text = doc_payload.get("text") or ""
+                    doc_meta = doc_payload.get("metadata") or metadata or {}
+                    docs.append(Document(page_content=text, metadata=doc_meta))
+                except Exception:
+                    continue
+
+        if store_docs:
+            store_docs(f"{name}: {query}", docs)
+
+        return _format_files_for_llm(hits)
+
+    return _search_metadata
+
+
+def build_metadata_schema_tool(
+    catalog: RemoteCatalogClient,
+    *,
+    name: str = METADATA_SCHEMA_NAME,
+    description: Optional[str] = None,
+):
+    from copilot import define_tool
+
+    tool_description = description or METADATA_SCHEMA_DESCRIPTION
+
+    @define_tool(name=name, description=tool_description)
+    async def _schema_tool() -> str:
+        try:
+            payload = catalog.schema()
+        except Exception as exc:
+            logger.warning("Metadata schema fetch failed: %s", exc)
+            return "Metadata schema fetch failed."
+        keys = payload.get("keys") or []
+        source_types = payload.get("source_types") or []
+        suffixes = payload.get("suffixes") or []
+        return (
+            "Supported keys: " + ", ".join(keys) + "\n"
+            "source_type values: " + (", ".join(source_types) or "none") + "\n"
+            "suffix values: " + (", ".join(suffixes) or "none")
+        )
+
+    return _schema_tool
+
+
+def build_document_fetch_tool(
+    catalog: RemoteCatalogClient,
+    *,
+    name: str = DOCUMENT_FETCH_NAME,
+    description: Optional[str] = None,
+    default_max_chars: int = 4000,
+):
+    from copilot import define_tool
+
+    tool_description = description or DOCUMENT_FETCH_DESCRIPTION
+
+    @define_tool(name=name, description=tool_description)
+    async def _fetch_document(params: DocumentFetchInput) -> str:
+        resource_hash = params.resource_hash
+        max_chars = params.max_chars or default_max_chars
+        if not resource_hash.strip():
+            return "Please provide a non-empty resource hash."
+
+        try:
+            doc_payload = (
+                catalog.get_document(resource_hash.strip(), max_chars=max_chars) or {}
+            )
+        except Exception as exc:
+            logger.warning("Document fetch failed: %s", exc)
+            return "Document fetch failed."
+
+        if not doc_payload:
+            return "Document not found."
+
+        path = doc_payload.get("path") or ""
+        metadata = (
+            doc_payload.get("metadata")
+            if isinstance(doc_payload.get("metadata"), dict)
+            else {}
+        )
+        text = doc_payload.get("text") or ""
+        meta_preview = _render_metadata_preview(metadata)
+
+        return f"Path: {path}\n" f"Metadata:\n{meta_preview}\n\n" f"Content:\n{text}"
+
+    return _fetch_document
diff --git a/src/archi/pipelines/copilot_agents/tools/monit_search.py b/src/archi/pipelines/copilot_agents/tools/monit_search.py
new file mode 100644
index 000000000..09c7679a1
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/tools/monit_search.py
@@ -0,0 +1,176 @@
+"""MONIT OpenSearch search and aggregation tools for the Copilot SDK.
+
+Migrated from ``src.archi.pipelines.agents.tools.monit_opensearch``.
+The ``MONITOpenSearchClient`` and response formatters are imported from
+the original module to avoid duplicating HTTP / formatting code.
+"""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from pydantic import BaseModel, Field
+
+from src.archi.pipelines.agents.tools.monit_opensearch import (
+    MAX_RESULTS_HARD_LIMIT, MONITOpenSearchClient,
+    _format_aggregation_response, _format_opensearch_response)
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+# ── Pydantic input models ────────────────────────────────────────────────
+
+
+class OpenSearchSearchInput(BaseModel):
+    query: str = Field(description="Lucene query string.")
+    from_time: str = Field(default="now-24h", description="Start time (date math).")
+    to_time: str = Field(default="now", description="End time (date math).")
+    max_results: int = Field(default=10, description="Max documents to return.")
+
+
+class OpenSearchAggregationInput(BaseModel):
+    query: str = Field(description="Lucene query string to filter documents.")
+    group_by: str = Field(description="Field to aggregate on.")
+    agg_type: str = Field(
+        default="terms",
+        description="Aggregation type: terms, sum, avg, min, max, cardinality.",
+    )
+    top_n: int = Field(
+        default=10, description="Number of top buckets for terms aggregation."
+    )
+    from_time: str = Field(default="now-24h", description="Start time (date math).")
+    to_time: str = Field(default="now", description="End time (date math).")
+
+
+# ── Tool metadata for registry ───────────────────────────────────────────
+
+SEARCH_TOOL_NAME = "monit_opensearch_search"
+SEARCH_TOOL_DESCRIPTION = "Search MONIT OpenSearch for CMS Rucio events."
+
+AGGREGATION_TOOL_NAME = "monit_opensearch_aggregation"
+AGGREGATION_TOOL_DESCRIPTION = (
+    "Run aggregation queries on MONIT OpenSearch for CMS Rucio events."
+)
+
+
+# ── Factory functions ────────────────────────────────────────────────────
+
+
+def build_monit_search_tool(
+    client: MONITOpenSearchClient,
+    *,
+    tool_name: str = SEARCH_TOOL_NAME,
+    index: str,
+    skill: Optional[str] = None,
+):
+    from copilot import define_tool
+
+    # Build description, optionally appending domain skill
+    base_desc = (
+        f"Search the '{index}' OpenSearch index using Lucene query syntax.\n\n"
+        "Input parameters:\n"
+        "- query: Lucene query string (required).\n"
+        "- from_time: Start time (default: 'now-24h'). Supports date math.\n"
+        "- to_time: End time (default: 'now'). Supports date math.\n"
+        f"- max_results: Max documents to return (default: 10, hard limit: {MAX_RESULTS_HARD_LIMIT}).\n"
+    )
+    if skill:
+        base_desc += f"\n--- Domain Knowledge ---\n{skill}"
+
+    @define_tool(name=tool_name, description=base_desc)
+    async def _search_opensearch(params: OpenSearchSearchInput) -> str:
+        query = params.query
+        from_time = params.from_time
+        to_time = params.to_time
+        max_results = params.max_results
+        if not query or not query.strip():
+            return "Please provide a non-empty Lucene query."
+
+        effective_max = min(max_results, MAX_RESULTS_HARD_LIMIT)
+
+        try:
+            response = client.search_with_lucene(
+                lucene_query=query.strip(),
+                from_time=from_time,
+                to_time=to_time,
+                size=effective_max,
+                index=index,
+            )
+            return _format_opensearch_response(
+                response,
+                query.strip(),
+                index,
+                effective_max,
+                from_time=from_time,
+                to_time=to_time,
+            )
+        except Exception as e:
+            logger.error("OpenSearch query error: %s", e, exc_info=True)
+            return f"Error querying OpenSearch: {e}"
+
+    return _search_opensearch
+
+
+def build_monit_aggregation_tool(
+    client: MONITOpenSearchClient,
+    *,
+    tool_name: str = AGGREGATION_TOOL_NAME,
+    index: str,
+    skill: Optional[str] = None,
+):
+    from copilot import define_tool
+
+    base_desc = (
+        f"Run aggregation queries on the '{index}' OpenSearch index.\n\n"
+        "Use this for counting, grouping, statistics — NOT for fetching individual documents.\n\n"
+        "Input parameters:\n"
+        "- query: Lucene query string to filter documents (required). Use '*' for all.\n"
+        "- group_by: Field to aggregate on (required).\n"
+        "- agg_type: Aggregation type (default: 'terms'). One of: terms, sum, avg, min, max, cardinality.\n"
+        "- top_n: Number of top buckets for terms aggregation (default: 10, max: 100).\n"
+        "- from_time: Start time (default: 'now-24h'). Supports date math.\n"
+        "- to_time: End time (default: 'now'). Supports date math.\n"
+    )
+    if skill:
+        base_desc += f"\n--- Domain Knowledge ---\n{skill}"
+
+    @define_tool(name=tool_name, description=base_desc)
+    async def _aggregate_opensearch(params: OpenSearchAggregationInput) -> str:
+        query = params.query
+        group_by = params.group_by
+        agg_type = params.agg_type
+        top_n = params.top_n
+        from_time = params.from_time
+        to_time = params.to_time
+        if not query or not query.strip():
+            return (
+                "Please provide a non-empty Lucene query (use '*' for all documents)."
+            )
+        if not group_by or not group_by.strip():
+            return "Please provide a field to aggregate on (group_by)."
+
+        try:
+            response = client.search_with_aggregation(
+                lucene_query=query.strip(),
+                group_by=group_by.strip(),
+                agg_type=agg_type,
+                top_n=top_n,
+                from_time=from_time,
+                to_time=to_time,
+                index=index,
+            )
+            return _format_aggregation_response(
+                response,
+                query.strip(),
+                index,
+                group_by.strip(),
+                agg_type,
+                from_time=from_time,
+                to_time=to_time,
+            )
+        except Exception as e:
+            logger.error("OpenSearch aggregation error: %s", e, exc_info=True)
+            return f"Error running aggregation: {e}"
+
+    return _aggregate_opensearch
diff --git a/src/archi/pipelines/copilot_agents/tools/retriever.py b/src/archi/pipelines/copilot_agents/tools/retriever.py
new file mode 100644
index 000000000..052555c09
--- /dev/null
+++ b/src/archi/pipelines/copilot_agents/tools/retriever.py
@@ -0,0 +1,73 @@
+"""Retriever tool — wraps a BaseRetriever for the Copilot SDK.
+
+Factory: ``build_retriever_tool(retriever, *, store_docs, ...)``
+Returns a ``@define_tool``-decorated async callable.
+
+Core helpers (``_normalize_results``, ``_format_documents_for_llm``) are
+imported from the LangGraph retriever module to avoid duplication — the
+same pattern used by ``file_search`` and ``monit_search``.
+"""
+
+from __future__ import annotations
+
+from typing import Callable, Optional, Sequence
+
+from langchain_core.documents import Document
+from langchain_core.retrievers import BaseRetriever
+from pydantic import BaseModel, Field
+
+from src.archi.pipelines.agents.tools.retriever import (
+    _format_documents_for_llm, _normalize_results)
+from src.utils.logging import get_logger
+
+logger = get_logger(__name__)
+
+
+# ── Pydantic input model ─────────────────────────────────────────────────
+
+
+class RetrieverInput(BaseModel):
+    query: str = Field(description="Search query for the knowledge base.")
+
+
+# ── Factory ──────────────────────────────────────────────────────────────
+
+TOOL_NAME = "search_vectorstore_hybrid"
+TOOL_DESCRIPTION = (
+    "Search the indexed knowledge base for relevant passages.\n"
+    "Input: query string.\n"
+    "Output: ranked snippets with source filename, resource hash, and score.\n"
+    'Example input: "transfer errors in CMS".'
+)
+
+
+def build_retriever_tool(
+    retriever: BaseRetriever,
+    *,
+    name: str = TOOL_NAME,
+    description: Optional[str] = None,
+    max_documents: int = 4,
+    max_chars: int = 800,
+    store_docs: Optional[Callable[[str, Sequence[Document]], None]] = None,
+):
+    """Return a ``@define_tool``-decorated async function.
+
+    Dependencies are captured via closure — the returned callable only
+    receives the Pydantic-validated ``RetrieverInput`` at invocation time.
+    """
+    from copilot import define_tool  # deferred import
+
+    tool_description = description or TOOL_DESCRIPTION
+
+    @define_tool(name=name, description=tool_description)
+    async def _retriever_tool(params: RetrieverInput) -> str:
+        query = params.query
+        results = retriever.invoke(query)
+        docs = _normalize_results(results or [])
+        if store_docs:
+            store_docs(f"{name}: {query}", [doc for doc, _ in docs])
+        return _format_documents_for_llm(
+            docs, max_documents=max_documents, max_chars=max_chars
+        )
+
+    return _retriever_tool
diff --git a/src/archi/providers/__init__.py b/src/archi/providers/__init__.py
index cc968f5b7..80a9dcbf1 100644
--- a/src/archi/providers/__init__.py
+++ b/src/archi/providers/__init__.py
@@ -38,7 +38,6 @@
 _DEFAULT_API_KEY_ENV_BY_PROVIDER: Dict[ProviderType, str] = {
     ProviderType.OPENAI: "OPENAI_API_KEY",
     ProviderType.ANTHROPIC: "ANTHROPIC_API_KEY",
-    ProviderType.GEMINI: "GEMINI_API_KEY",
     ProviderType.OPENROUTER: "OPENROUTER_API_KEY",
     ProviderType.CERN_LITELLM: "CERN_LITELLM_API_KEY",
 }
@@ -73,14 +72,12 @@ def _ensure_providers_registered() -> None:
     # Import and register all providers
     from src.archi.providers.openai_provider import OpenAIProvider
     from src.archi.providers.anthropic_provider import AnthropicProvider
-    from src.archi.providers.gemini_provider import GeminiProvider
     from src.archi.providers.openrouter_provider import OpenRouterProvider
     from src.archi.providers.local_provider import LocalProvider
     from src.archi.providers.cern_litellm_provider import CERNLiteLLMProvider
     
     register_provider(ProviderType.OPENAI, OpenAIProvider)
     register_provider(ProviderType.ANTHROPIC, AnthropicProvider)
-    register_provider(ProviderType.GEMINI, GeminiProvider)
     register_provider(ProviderType.OPENROUTER, OpenRouterProvider)
     register_provider(ProviderType.LOCAL, LocalProvider)
     register_provider(ProviderType.CERN_LITELLM, CERNLiteLLMProvider)
@@ -144,7 +141,6 @@ def get_provider_by_name(name: str, **kwargs) -> BaseProvider:
     This is a convenience function that accepts common names like:
     - "openai", "OpenAI"
     - "anthropic", "claude", "Anthropic"
-    - "gemini", "google", "Gemini"
     - "openrouter", "OpenRouter"
     - "local", "ollama", "Local"
     
@@ -163,8 +159,6 @@ def get_provider_by_name(name: str, **kwargs) -> BaseProvider:
         "gpt": ProviderType.OPENAI,
         "anthropic": ProviderType.ANTHROPIC,
         "claude": ProviderType.ANTHROPIC,
-        "gemini": ProviderType.GEMINI,
-        "google": ProviderType.GEMINI,
         "openrouter": ProviderType.OPENROUTER,
         "local": ProviderType.LOCAL,
         "ollama": ProviderType.LOCAL,
diff --git a/src/archi/providers/base.py b/src/archi/providers/base.py
index 8157c70b3..3e91524b4 100644
--- a/src/archi/providers/base.py
+++ b/src/archi/providers/base.py
@@ -22,7 +22,6 @@ class ProviderType(str, Enum):
     """Enumeration of supported provider types."""
     OPENAI = "openai"
     ANTHROPIC = "anthropic"
-    GEMINI = "gemini"
     OPENROUTER = "openrouter"
     LOCAL = "local"
     CERN_LITELLM = "cern_litellm"
diff --git a/src/archi/providers/gemini_provider.py b/src/archi/providers/gemini_provider.py
deleted file mode 100644
index fd95c7f3f..000000000
--- a/src/archi/providers/gemini_provider.py
+++ /dev/null
@@ -1,103 +0,0 @@
-"""Google Gemini provider implementation."""
-
-from typing import Any, Dict, List, Optional
-
-from src.archi.providers.base import (
-    BaseProvider,
-    ModelInfo,
-    ProviderConfig,
-    ProviderType,
-)
-from src.utils.logging import get_logger
-
-logger = get_logger(__name__)
-
-
-# Default models available from Google Gemini
-DEFAULT_GEMINI_MODELS = [
-    ModelInfo(
-        id="gemini-2.0-flash",
-        name="gemini-2.0-flash",
-        display_name="Gemini 2.0 Flash",
-        context_window=1048576,
-        supports_tools=True,
-        supports_streaming=True,
-        supports_vision=True,
-        max_output_tokens=8192,
-    ),
-    ModelInfo(
-        id="gemini-2.0-flash-thinking",
-        name="gemini-2.0-flash-thinking",
-        display_name="Gemini 2.0 Flash Thinking",
-        context_window=1048576,
-        supports_tools=True,
-        supports_streaming=True,
-        supports_vision=True,
-        max_output_tokens=8192,
-    ),
-    ModelInfo(
-        id="gemini-1.5-pro",
-        name="gemini-1.5-pro",
-        display_name="Gemini 1.5 Pro",
-        context_window=2097152,
-        supports_tools=True,
-        supports_streaming=True,
-        supports_vision=True,
-        max_output_tokens=8192,
-    ),
-    ModelInfo(
-        id="gemini-1.5-flash",
-        name="gemini-1.5-flash",
-        display_name="Gemini 1.5 Flash",
-        context_window=1048576,
-        supports_tools=True,
-        supports_streaming=True,
-        supports_vision=True,
-        max_output_tokens=8192,
-    ),
-]
-
-
-class GeminiProvider(BaseProvider):
-    """Provider for Google Gemini models."""
-    
-    provider_type = ProviderType.GEMINI
-    display_name = "Google Gemini"
-    
-    def __init__(self, config: Optional[ProviderConfig] = None):
-        if config is None:
-            config = ProviderConfig(
-                provider_type=ProviderType.GEMINI,
-                api_key_env="GOOGLE_API_KEY",
-                models=DEFAULT_GEMINI_MODELS,
-                default_model="gemini-2.0-flash",
-            )
-        super().__init__(config)
-    
-    def get_chat_model(self, model_name: str, **kwargs):
-        """Get a Gemini chat model instance."""
-        try:
-            from langchain_google_genai import ChatGoogleGenerativeAI
-        except ImportError:
-            raise ImportError(
-                "langchain-google-genai is required for Gemini provider. "
-                "Install with: pip install langchain-google-genai"
-            )
-        
-        model_kwargs = {
-            "model": model_name,
-            "streaming": True,
-            **self.config.extra_kwargs,
-            **kwargs,
-        }
-        
-        if self._api_key:
-            model_kwargs["google_api_key"] = self._api_key
-            
-        return ChatGoogleGenerativeAI(**model_kwargs)
-    
-    def list_models(self) -> List[ModelInfo]:
-        """List available Gemini models."""
-        if self.config.models:
-            return self.config.models
-        return DEFAULT_GEMINI_MODELS
diff --git a/src/archi/pipelines/agents/utils/mcp_utils.py b/src/archi/utils/async_loop.py
similarity index 71%
rename from src/archi/pipelines/agents/utils/mcp_utils.py
rename to src/archi/utils/async_loop.py
index 5135f6d04..45838cd86 100644
--- a/src/archi/pipelines/agents/utils/mcp_utils.py
+++ b/src/archi/utils/async_loop.py
@@ -1,16 +1,26 @@
-from typing import Optional, Any
+"""Dedicated background thread running a single asyncio event loop.
+
+Provides a thread-safe bridge for scheduling async coroutines from
+synchronous (Flask) code.  Used by the Copilot SDK adapter and,
+historically, by MCP tool wrappers.
+"""
+
 import asyncio
 import threading
+from typing import Any, Optional
+
 from src.utils.logging import get_logger
 
 logger = get_logger(__name__)
 
+
 class AsyncLoopThread:
     """
     A dedicated background thread running a single event loop.
 
-    This ensures all async operations (MCP client init, tool calls) happen
-    on the same event loop, preventing ClosedResourceError.
+    This ensures all async operations (Copilot SDK calls, MCP client init,
+    tool calls) happen on the same event loop, preventing
+    ClosedResourceError.
 
     Usage:
         runner = AsyncLoopThread.get_instance()
@@ -26,14 +36,14 @@ def __init__(self):
         self.thread = threading.Thread(
             target=self._run,
             daemon=True,
-            name="mcp-async-loop"
+            name="async-loop",
         )
         self.thread.start()
 
         # Wait for the loop to actually start before returning
         if not self._started.wait(timeout=10.0):
             raise RuntimeError("Failed to start async loop thread")
-        logger.info("Background async loop started for MCP operations")
+        logger.info("Background async loop started")
 
     def _run(self):
         """Run the event loop forever in the background thread."""
@@ -47,7 +57,7 @@ def run(self, coro, timeout: Optional[float] = 120.0) -> Any:
 
         Args:
             coro: An awaitable coroutine
-            timeout: Maximum seconds to wait (default 120s for MCP operations)
+            timeout: Maximum seconds to wait (default 120s)
 
         Returns:
             The result of the coroutine
@@ -59,10 +69,17 @@ def run(self, coro, timeout: Optional[float] = 120.0) -> Any:
         future = asyncio.run_coroutine_threadsafe(coro, self.loop)
         return future.result(timeout=timeout)
 
+    def run_no_wait(self, coro) -> "asyncio.Future":
+        """Schedule a coroutine on the background loop without blocking.
+
+        Returns the ``concurrent.futures.Future`` so the caller can check
+        completion later (e.g. ``future.result(timeout=...)``).
+        """
+        return asyncio.run_coroutine_threadsafe(coro, self.loop)
+
     def in_loop_thread(self) -> bool:
         """Return True if called from the background event-loop thread."""
         return threading.current_thread() is self.thread
-        # or: return threading.get_ident() == self.thread.ident
 
     @classmethod
     def get_instance(cls) -> "AsyncLoopThread":
diff --git a/src/cli/templates/dockerfiles/Dockerfile-base b/src/cli/templates/dockerfiles/Dockerfile-base
index 00db4db30..138d057c4 100644
--- a/src/cli/templates/dockerfiles/Dockerfile-base
+++ b/src/cli/templates/dockerfiles/Dockerfile-base
@@ -1,6 +1,6 @@
 # syntax=docker/dockerfile:1
 # FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
-FROM docker.io/library/python:3.10
+FROM docker.io/library/python:3.11
 RUN mkdir -p /root/archi
 WORKDIR /root/archi
 
diff --git a/src/cli/templates/dockerfiles/Dockerfile-chat b/src/cli/templates/dockerfiles/Dockerfile-chat
index 06e699f40..d54a263ae 100644
--- a/src/cli/templates/dockerfiles/Dockerfile-chat
+++ b/src/cli/templates/dockerfiles/Dockerfile-chat
@@ -34,6 +34,20 @@ RUN GECKO_VERSION="v0.36.0" && \
     && chmod +x /usr/local/bin/geckodriver \
     && rm "geckodriver-${GECKO_VERSION}-${GECKO_ARCH}.tar.gz"
 
+# Install GitHub Copilot CLI (used by the Copilot SDK agent runtime)
+ARG COPILOT_CLI_VERSION=v1.0.12
+RUN ARCH=$(uname -m) && \
+    if [ "$ARCH" = "aarch64" ]; then CLI_ARCH="arm64"; else CLI_ARCH="x64"; fi && \
+    if [ "$COPILOT_CLI_VERSION" = "latest" ]; then \
+        DOWNLOAD_URL="https://github.com/github/copilot-cli/releases/latest/download/copilot-linux-${CLI_ARCH}.tar.gz"; \
+    else \
+        DOWNLOAD_URL="https://github.com/github/copilot-cli/releases/download/${COPILOT_CLI_VERSION}/copilot-linux-${CLI_ARCH}.tar.gz"; \
+    fi && \
+    wget -q -O /tmp/copilot.tar.gz "$DOWNLOAD_URL" && \
+    tar -xzf /tmp/copilot.tar.gz -C /usr/local/bin && \
+    chmod +x /usr/local/bin/copilot && \
+    rm /tmp/copilot.tar.gz
+
 COPY archi_code src
 COPY configs configs
 COPY pyproject.toml pyproject.toml
diff --git a/src/cli/templates/dockerfiles/base-python-image/Dockerfile b/src/cli/templates/dockerfiles/base-python-image/Dockerfile
index 00db4db30..138d057c4 100644
--- a/src/cli/templates/dockerfiles/base-python-image/Dockerfile
+++ b/src/cli/templates/dockerfiles/base-python-image/Dockerfile
@@ -1,6 +1,6 @@
 # syntax=docker/dockerfile:1
 # FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
-FROM docker.io/library/python:3.10
+FROM docker.io/library/python:3.11
 RUN mkdir -p /root/archi
 WORKDIR /root/archi
 
diff --git a/src/cli/templates/dockerfiles/base-python-image/requirements.txt b/src/cli/templates/dockerfiles/base-python-image/requirements.txt
index 9660f2fa5..a45340781 100644
--- a/src/cli/templates/dockerfiles/base-python-image/requirements.txt
+++ b/src/cli/templates/dockerfiles/base-python-image/requirements.txt
@@ -22,6 +22,7 @@ httptools==0.6.1
 httpx==0.27.2
 humanfriendly==10.0
 croniter==2.0.5
+github-copilot-sdk>=0.2.0
 langgraph==1.0.2
 langchain-mcp-adapters==0.1.11
 langchain==1.0.3
diff --git a/src/cli/templates/init.sql b/src/cli/templates/init.sql
index 1334fc23c..9960ab413 100644
--- a/src/cli/templates/init.sql
+++ b/src/cli/templates/init.sql
@@ -355,9 +355,13 @@ CREATE TABLE IF NOT EXISTS conversation_metadata (
     title TEXT,
     created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
     last_message_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-    archi_version VARCHAR(50)
+    archi_version VARCHAR(50),
+    pipeline_session_id TEXT
 );
 
+ALTER TABLE conversation_metadata
+ADD COLUMN IF NOT EXISTS pipeline_session_id TEXT;
+
 CREATE INDEX IF NOT EXISTS idx_conv_meta_user ON conversation_metadata(user_id);
 CREATE INDEX IF NOT EXISTS idx_conv_meta_client ON conversation_metadata(client_id);
 
diff --git a/src/data_manager/collectors/tickets/ticket_manager.py b/src/data_manager/collectors/tickets/ticket_manager.py
index 90e4f7989..3edfc2efe 100644
--- a/src/data_manager/collectors/tickets/ticket_manager.py
+++ b/src/data_manager/collectors/tickets/ticket_manager.py
@@ -134,12 +134,12 @@ def _collect_from_client(
             elif name == "Redmine":
                 self.redmine_projects.update(projects)
                 outdir = self.data_path / "redmine"
+
+            for resource in resources:
+                persistence.persist_resource(resource, outdir)
         except Exception as exc:
             logger.warning(
                 f"{name} collection failed; skipping remaining tickets from this source.",
                 exc_info=exc,
             )
             return
-
-        for resource in resources:
-            persistence.persist_resource(resource, outdir,overwrite)
diff --git a/src/interfaces/chat_app/api.py b/src/interfaces/chat_app/api.py
index 76cdcdc51..08dfe5786 100644
--- a/src/interfaces/chat_app/api.py
+++ b/src/interfaces/chat_app/api.py
@@ -41,7 +41,13 @@ def _get_agent_class_name_from_config() -> Optional[str]:
     return chat_cfg.get("agent_class") or chat_cfg.get("pipeline")
 
 
-def _get_agent_tool_registry(agent_class_name: Optional[str]) -> List[str]:
+def _get_agent_tool_registry(agent_class_name: Optional[str]) -> List[dict]:
+    """Return tool objects ``{name, description}`` for the agent editor palette.
+
+    These are Archi's own custom tools from ``TOOL_REGISTRY``.  SDK built-in
+    tools (bash, sql, report_intent, etc.) are deliberately excluded — they are
+    blocked at the session level via ``available_tools`` allowlist.
+    """
     if not agent_class_name:
         return []
     try:
@@ -55,15 +61,26 @@ def _get_agent_tool_registry(agent_class_name: Optional[str]) -> List[str]:
     try:
         dummy = agent_cls.__new__(agent_cls)
         registry = agent_cls.get_tool_registry(dummy) or {}
-        return sorted([name for name in registry.keys() if isinstance(name, str)])
+        descriptions = {}
+        if hasattr(agent_cls, "get_tool_descriptions"):
+            descriptions = agent_cls.get_tool_descriptions(dummy) or {}
+        return sorted(
+            [
+                {"name": name, "description": descriptions.get(name, "")}
+                for name in registry
+                if isinstance(name, str)
+            ],
+            key=lambda t: t["name"],
+        )
     except Exception as exc:
         logger.warning("Failed to read tool registry for %s: %s", agent_class_name, exc)
         return []
 
 
-def _build_agent_template(name: str, tools: List[str]) -> str:
-    tools_block = "\n".join(f"- {tool}" for tool in tools) if tools else "- <tool_name>"
-    tools_comment = "\n".join(f"- {tool}" for tool in tools) if tools else "- (no tools available)"
+def _build_agent_template(name: str, tools: List[dict]) -> str:
+    tool_names = [t["name"] if isinstance(t, dict) else t for t in tools]
+    tools_block = "\n".join(f"- {n}" for n in tool_names) if tool_names else "- <tool_name>"
+    tools_comment = "\n".join(f"- {n}" for n in tool_names) if tool_names else "- (no tools available)"
     return (
         f"# {name}\n\n"
         "## Tools\n"
diff --git a/src/interfaces/chat_app/app.py b/src/interfaces/chat_app/app.py
index 3a9ea59b5..e2d1f8bc5 100644
--- a/src/interfaces/chat_app/app.py
+++ b/src/interfaces/chat_app/app.py
@@ -54,6 +54,10 @@
     SQL_LIST_CONVERSATIONS, SQL_GET_CONVERSATION_METADATA, SQL_DELETE_CONVERSATION,
     SQL_LIST_CONVERSATIONS_BY_USER, SQL_GET_CONVERSATION_METADATA_BY_USER,
     SQL_DELETE_CONVERSATION_BY_USER, SQL_UPDATE_CONVERSATION_TIMESTAMP_BY_USER,
+    SQL_GET_CONVERSATION_PIPELINE_SESSION_ID,
+    SQL_GET_CONVERSATION_PIPELINE_SESSION_ID_BY_USER,
+    SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID,
+    SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID_BY_USER,
     SQL_INSERT_TOOL_CALLS, SQL_QUERY_CONVO_WITH_FEEDBACK, SQL_DELETE_REACTION_FEEDBACK,
     SQL_GET_REACTION_FEEDBACK,
     SQL_INSERT_AB_COMPARISON, SQL_UPDATE_AB_PREFERENCE, SQL_GET_AB_COMPARISON,
@@ -244,6 +248,7 @@ def __init__(self):
             "password": read_secret("PG_PASSWORD"),
             **self.services_config["postgres"],
         }
+        self._ensure_conversation_metadata_schema()
 
         # initialize data manager (ingestion handled by data-manager service)
         # self.data_manager = DataManager(run_ingestion=False)
@@ -311,6 +316,8 @@ def __init__(self):
         # track active config/model/pipeline state
         self.default_config_name = self.config.get("name")
         self.current_config_name = None
+        self.current_model_used = None
+        self.current_pipeline_used = None
         self._config_cache = {}
         if self.default_config_name:
             self._config_cache[self.default_config_name] = self.config
@@ -319,6 +326,119 @@ def __init__(self):
         if self.default_config_name:
             self.update_config(config_name=self.default_config_name)
 
+    def _ensure_conversation_metadata_schema(self) -> None:
+        """Backfill chat metadata columns needed by newer pipelines."""
+        conn = psycopg2.connect(**self.pg_config)
+        cursor = conn.cursor()
+        try:
+            cursor.execute(
+                """
+                ALTER TABLE conversation_metadata
+                ADD COLUMN IF NOT EXISTS pipeline_session_id TEXT
+                """
+            )
+            conn.commit()
+        except psycopg2.Error as exc:
+            conn.rollback()
+            logger.debug("Could not ensure conversation metadata columns: %s", exc)
+        finally:
+            cursor.close()
+            conn.close()
+
+    def _pipeline_supports_persisted_session_id(self) -> bool:
+        """Return whether the active pipeline uses persisted SDK sessions."""
+        pipeline = getattr(self.archi, "pipeline", None)
+        return bool(
+            pipeline
+            and callable(getattr(pipeline, "supports_persisted_session_id", None))
+            and pipeline.supports_persisted_session_id()
+        )
+
+    def get_pipeline_session_id(
+        self,
+        conversation_id: int,
+        client_id: str,
+        user_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """Return the persisted pipeline session ID for a conversation, if used."""
+        if not self._pipeline_supports_persisted_session_id():
+            return None
+        conn = psycopg2.connect(**self.pg_config)
+        cursor = conn.cursor()
+        try:
+            if user_id:
+                cursor.execute(
+                    SQL_GET_CONVERSATION_PIPELINE_SESSION_ID_BY_USER,
+                    (conversation_id, user_id, client_id),
+                )
+            else:
+                cursor.execute(
+                    SQL_GET_CONVERSATION_PIPELINE_SESSION_ID,
+                    (conversation_id, client_id),
+                )
+            row = cursor.fetchone()
+            return row[0] if row else None
+        finally:
+            cursor.close()
+            conn.close()
+
+    def set_pipeline_session_id(
+        self,
+        conversation_id: int,
+        client_id: str,
+        session_id: str,
+        user_id: Optional[str] = None,
+    ) -> None:
+        """Persist the pipeline session ID for future resume attempts."""
+        if not self._pipeline_supports_persisted_session_id():
+            return
+        if not session_id:
+            return
+        conn = psycopg2.connect(**self.pg_config)
+        cursor = conn.cursor()
+        try:
+            if user_id:
+                cursor.execute(
+                    SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID_BY_USER,
+                    (session_id, conversation_id, user_id, client_id),
+                )
+            else:
+                cursor.execute(
+                    SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID,
+                    (session_id, conversation_id, client_id),
+                )
+            conn.commit()
+        finally:
+            cursor.close()
+            conn.close()
+
+    def _persist_pipeline_session_id_from_output(
+        self,
+        output: PipelineOutput,
+        *,
+        conversation_id: int,
+        client_id: str,
+        current_session_id: Optional[str],
+        user_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """Store the pipeline session ID exposed by pipeline outputs."""
+        if not self._pipeline_supports_persisted_session_id():
+            return None
+        metadata = getattr(output, "metadata", None) or {}
+        session_id = metadata.get("pipeline_session_id")
+        if not isinstance(session_id, str) or not session_id.strip():
+            return current_session_id
+        session_id = session_id.strip()
+        if session_id == current_session_id:
+            return current_session_id
+        self.set_pipeline_session_id(
+            conversation_id,
+            client_id,
+            session_id,
+            user_id=user_id,
+        )
+        return session_id
+
     def update_config(self, config_name=None):
         """
         Update the active config and apply it to the pipeline.
@@ -376,6 +496,8 @@ def update_config(self, config_name=None):
         model_name = self._extract_model_name(config_payload)
         
         self.current_config_name = target_config_name
+        self.current_pipeline_used = agent_class
+        self.current_model_used = model_name
         self.archi.update(pipeline=agent_class, config_name=target_config_name)
 
     def _extract_model_name(self, config_payload):
@@ -1151,7 +1273,7 @@ def prepare_context_for_storage(self, source_documents, scores):
 
         return context
 
-    def insert_conversation(self, conversation_id, user_message, archi_message, link, archi_context, context:ChatRequestContext, is_refresh=False) -> List[int]:
+    def insert_conversation(self, conversation_id, user_message, archi_message, link, archi_context, is_refresh=False) -> List[int]:
         """
         """
         logger.debug("Entered insert_conversation.")
@@ -1167,20 +1289,18 @@ def _sanitize(text: str) -> str:
         user_content = _sanitize(user_content)
         archi_content = _sanitize(archi_content)
         link = _sanitize(link)
-        model_provider = f"{context.provider_used}/{context.model_used}"
-        pipeline_used = type(context.pipeline_used).__name__
         archi_context = _sanitize(archi_context)
 
         # construct insert_tups with model_used and pipeline_used
         # Format: (service, conversation_id, sender, content, link, context, ts, model_used, pipeline_used)
         insert_tups = (
             [
-                (service, conversation_id, user_sender, user_content, '', '', user_msg_ts, model_provider, pipeline_used),
-                (service, conversation_id, ARCHI_SENDER, archi_content, link, archi_context, archi_msg_ts, model_provider, pipeline_used),
+                (service, conversation_id, user_sender, user_content, '', '', user_msg_ts, self.current_model_used, self.current_pipeline_used),
+                (service, conversation_id, ARCHI_SENDER, archi_content, link, archi_context, archi_msg_ts, self.current_model_used, self.current_pipeline_used),
             ]
             if not is_refresh
             else [
-                (service, conversation_id, ARCHI_SENDER, archi_content, link, archi_context, archi_msg_ts, model_provider, pipeline_used),
+                (service, conversation_id, ARCHI_SENDER, archi_content, link, archi_context, archi_msg_ts, self.current_model_used, self.current_pipeline_used),
             ]
         )
 
@@ -1233,10 +1353,49 @@ def insert_tool_calls_from_output(self, conversation_id: int, message_id: int, o
         """
         Extract and store agent tool calls from the pipeline output.
 
-        AIMessage with tool_calls contains the tool name, args, and timestamp.
-        ToolMessage contains the result, matched by tool_call_id.
+        Checks ``metadata["tool_calls"]`` first (Copilot adapter format,
+        decision 12), then falls back to messages-based extraction
+        (classic pipelines).
         """
-        if not output or not output.messages:
+        if not output:
+            return
+
+        # ── Path 1: metadata-based tool calls (Copilot adapter) ──────
+        meta_tool_calls = (output.metadata or {}).get("tool_calls")
+        if meta_tool_calls:
+            insert_tups = []
+            for step_number, tc in enumerate(meta_tool_calls, start=1):
+                tool_name = tc.get("name", "unknown")
+                tool_args = tc.get("args", {})
+                tool_result = tc.get("result", "")
+                if len(tool_result) > 500:
+                    tool_result = tool_result[:500] + "..."
+                created_at = tc.get("created_at")
+                if created_at:
+                    try:
+                        ts = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
+                    except (ValueError, TypeError):
+                        ts = datetime.now(timezone.utc)
+                else:
+                    ts = datetime.now(timezone.utc)
+                insert_tups.append((
+                    conversation_id, message_id, step_number,
+                    tool_name,
+                    json.dumps(tool_args) if tool_args else None,
+                    tool_result, ts,
+                ))
+            if insert_tups:
+                logger.debug("Inserting %d tool calls (metadata) for message %d", len(insert_tups), message_id)
+                conn = psycopg2.connect(**self.pg_config)
+                cursor = conn.cursor()
+                psycopg2.extras.execute_values(cursor, SQL_INSERT_TOOL_CALLS, insert_tups)
+                conn.commit()
+                cursor.close()
+                conn.close()
+            return
+
+        # ── Path 2: messages-based extraction (classic pipelines) ────
+        if not output.messages:
             return
 
         tool_calls = output.extract_tool_calls()
@@ -1527,7 +1686,6 @@ def _finalize_result(
             archi_message,
             best_reference,
             context_data,
-            context,
             context.is_refresh,
         )
         timestamps["insert_convo_ts"] = datetime.now(timezone.utc)
@@ -1547,7 +1705,7 @@ def _finalize_result(
                     has_tool_calls,
                     has_tool_call_id,
                 )
-        if agent_messages and message_ids:
+        if message_ids:
             archi_message_id = message_ids[-1]
             self.insert_tool_calls_from_output(context.conversation_id, archi_message_id, result)
 
@@ -1581,7 +1739,23 @@ def __call__(self, message: List[str], conversation_id: int|None, client_id: str
             requested_config = self._resolve_config_name(config_name)
             self.update_config(config_name=requested_config)
 
-            result = self.archi(history=context.history, conversation_id=context.conversation_id)
+            pipeline_session_id = self.get_pipeline_session_id(
+                context.conversation_id,
+                client_id,
+                user_id=user_id,
+            )
+            result = self.archi(
+                history=context.history,
+                conversation_id=context.conversation_id,
+                pipeline_session_id=pipeline_session_id,
+            )
+            self._persist_pipeline_session_id_from_output(
+                result,
+                conversation_id=context.conversation_id,
+                client_id=client_id,
+                current_session_id=pipeline_session_id,
+                user_id=user_id,
+            )
             timestamps["chain_finished_ts"] = datetime.now(timezone.utc)
 
             # keep track of total number of queries and log this amount
@@ -1707,26 +1881,24 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
             requested_config = self._resolve_config_name(config_name)
             self.update_config(config_name=requested_config)
             
-            # If provider and model are specified in the context, override the pipeline's LLM
-            provider = context.provider_used
-            model = context.model_used
+            # Build per-request kwargs for provider/model override
+            stream_kwargs: Dict[str, Any] = {
+                "history": context.history,
+                "conversation_id": context.conversation_id,
+            }
+            pipeline_session_id = self.get_pipeline_session_id(
+                context.conversation_id,
+                client_id,
+                user_id=user_id,
+            )
+            if pipeline_session_id:
+                stream_kwargs["pipeline_session_id"] = pipeline_session_id
             if provider and model:
-                try:
-                    override_llm = self._create_provider_llm(provider, model, provider_api_key)
-                    if override_llm and hasattr(self.archi, 'pipeline') and hasattr(self.archi.pipeline, 'agent_llm'):
-                        original_llm = self.archi.pipeline.agent_llm
-                        self.archi.pipeline.agent_llm = override_llm
-                        # Force agent refresh to use new LLM
-                        if hasattr(self.archi.pipeline, 'refresh_agent'):
-                            self.archi.pipeline.refresh_agent(force=True)
-                        logger.info(f"Overrode pipeline LLM with {provider}/{model}")
-                except ValueError as e:
-                    logger.warning(f"Failed to create provider LLM {provider}/{model}: {e}")
-                    yield {"type": "error", "status": 400, "message": str(e)}
-                    return
-                except Exception as e:
-                    logger.warning(f"Failed to create provider LLM {provider}/{model}: {e}")
-                    yield {"type": "warning", "message": f"Using default model: {e}"}
+                stream_kwargs["provider"] = provider
+                stream_kwargs["model"] = model
+                logger.info(f"Requesting pipeline override: {provider}/{model}")
+            if provider_api_key:
+                stream_kwargs["provider_api_key"] = provider_api_key
             
             # Create trace for this streaming request
             trace_id = self.create_agent_trace(
@@ -1736,7 +1908,7 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
                 pipeline_name=self.archi.pipeline_name if hasattr(self.archi, 'pipeline_name') else None,
             )
 
-            for output in self.archi.stream(history=context.history, conversation_id=context.conversation_id,model=context.model_used):
+            for output in self.archi.stream(**stream_kwargs):
                 if client_timeout and time.time() - stream_start_time > client_timeout:
                     if trace_id:
                         total_duration_ms = int((time.time() - stream_start_time) * 1000)
@@ -1752,12 +1924,14 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
                     return
                 last_output = output
                 
-                # Extract event_type from metadata (new structured events from BaseReActAgent)
+                # Extract event_type from metadata (structured events from agent pipeline)
                 event_type = output.metadata.get("event_type", "text") if output.metadata else "text"
                 timestamp = datetime.now(timezone.utc).isoformat()
                 
                 # Handle different event types
                 if event_type == "tool_start":
+                    # Try messages-based path first (classic pipelines),
+                    # fall back to metadata-only (Copilot adapter, decision 16).
                     tool_messages = getattr(output, "messages", []) or []
                     tool_message = tool_messages[0] if tool_messages else None
                     tool_calls = getattr(tool_message, "tool_calls", None) if tool_message else None
@@ -1834,8 +2008,20 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
                             if tool_call_id in emitted_tool_call_ids:
                                 continue
                             emitted_tool_call_ids.add(tool_call_id)
+                            emitted_tool_start_ids.add(tool_call_id)
                             pending_tool_call_ids.append(tool_call_id)
                             tool_call_count += 1
+                            trace_event = {
+                                "type": "tool_start",
+                                "tool_call_id": tool_call_id,
+                                "tool_name": tool_name,
+                                "tool_args": tool_args,
+                                "timestamp": timestamp,
+                                "conversation_id": context.conversation_id,
+                            }
+                            trace_events.append(trace_event)
+                            if include_tool_steps:
+                                yield trace_event
                     elif memory_args_by_id:
                         for memory_id, memory_call in memory_args_by_id.items():
                             if not isinstance(memory_call, dict):
@@ -1848,19 +2034,60 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
                             if tool_call_id in emitted_tool_call_ids:
                                 continue
                             emitted_tool_call_ids.add(tool_call_id)
+                            emitted_tool_start_ids.add(tool_call_id)
                             pending_tool_call_ids.append(tool_call_id)
                             _remember_tool_call(tool_call_id, tool_name, tool_args)
                             tool_call_count += 1
+                            trace_event = {
+                                "type": "tool_start",
+                                "tool_call_id": tool_call_id,
+                                "tool_name": tool_name,
+                                "tool_args": tool_args,
+                                "timestamp": timestamp,
+                                "conversation_id": context.conversation_id,
+                            }
+                            trace_events.append(trace_event)
+                            if include_tool_steps:
+                                yield trace_event
+                    elif output.metadata:
+                        # Metadata-only path (Copilot adapter)
+                        meta_tool_call_id = output.metadata.get("tool_call_id", "")
+                        meta_tool_name = output.metadata.get("tool_name", "unknown")
+                        meta_tool_args = output.metadata.get("tool_args", {})
+                        _remember_tool_call(meta_tool_call_id, meta_tool_name, meta_tool_args)
+                        if meta_tool_call_id and meta_tool_call_id in emitted_tool_call_ids:
+                            continue
+                        if meta_tool_call_id:
+                            emitted_tool_call_ids.add(meta_tool_call_id)
+                            emitted_tool_start_ids.add(meta_tool_call_id)
+                        tool_call_count += 1
+                        trace_event = {
+                            "type": "tool_start",
+                            "tool_call_id": meta_tool_call_id,
+                            "tool_name": meta_tool_name,
+                            "tool_args": meta_tool_args,
+                            "timestamp": timestamp,
+                            "conversation_id": context.conversation_id,
+                        }
+                        trace_events.append(trace_event)
+                        if include_tool_steps:
+                            yield trace_event
                         
                 elif event_type == "tool_output":
+                    # Messages-based path (classic) or metadata-only (Copilot adapter)
                     tool_messages = getattr(output, "messages", []) or []
                     tool_message = tool_messages[0] if tool_messages else None
-                    tool_output = self._message_content(tool_message) if tool_message else ""
-                    truncated = len(tool_output) > max_step_chars
-                    full_length = len(tool_output) if truncated else None
-                    display_output = self._truncate_text(tool_output, max_step_chars)
+                    if tool_message:
+                        tool_output_text = self._message_content(tool_message)
+                        tool_call_id = getattr(tool_message, "tool_call_id", "")
+                    else:
+                        tool_output_text = output.metadata.get("output", "") if output.metadata else ""
+                        tool_call_id = output.metadata.get("tool_call_id", "") if output.metadata else ""
+                    truncated = len(tool_output_text) > max_step_chars
+                    full_length = len(tool_output_text) if truncated else None
+                    display_output = self._truncate_text(tool_output_text, max_step_chars)
                     
-                    output_tool_call_id = getattr(tool_message, "tool_call_id", "") if tool_message else ""
+                    output_tool_call_id = getattr(tool_message, "tool_call_id", "") if tool_message else tool_call_id
                     if not output_tool_call_id and pending_tool_call_ids:
                         output_tool_call_id = pending_tool_call_ids.pop(0)
                     elif output_tool_call_id in pending_tool_call_ids:
@@ -1963,6 +2190,14 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
                 elif event_type == "final":
                     # Final event handled below after loop
                     pass
+                elif event_type == "error":
+                    error_msg = output.metadata.get("error", "Unknown error")
+                    logger.error("Agent pipeline error: %s", error_msg)
+                    yield {
+                        "type": "error",
+                        "message": error_msg,
+                        "timestamp": timestamp,
+                    }
                 else:
                     # Fallback: legacy event handling for non-agent pipelines
                     if getattr(output, "final", False):
@@ -2024,6 +2259,14 @@ def _remember_tool_call(tool_call_id: str, tool_name: Any, tool_args: Any) -> No
             self.number_of_queries += 1
             logger.info(f"Number of queries is: {self.number_of_queries}")
 
+            pipeline_session_id = self._persist_pipeline_session_id_from_output(
+                last_output,
+                conversation_id=context.conversation_id,
+                client_id=client_id,
+                current_session_id=pipeline_session_id,
+                user_id=user_id,
+            )
+
             output, message_ids = self._finalize_result(
                 last_output,
                 context=context,
@@ -2759,16 +3002,18 @@ def get_providers(self):
                 ProviderType,
             )
 
+            session_keys = session.get('provider_api_keys', {})
             providers_data = []
             for provider_type in list_provider_types():
                 try:
                     cfg = _build_provider_config_from_payload(self.config, provider_type)
                     provider = get_provider(provider_type, config=cfg) if cfg else get_provider(provider_type)
                     models = provider.list_models()
+                    has_session_key = provider_type.value in session_keys
                     providers_data.append({
                         'type': provider_type.value,
                         'display_name': provider.display_name,
-                        'enabled': provider.is_enabled,
+                        'enabled': provider.is_enabled or has_session_key,
                         'default_model': provider.config.default_model,
                         'models': [
                             {
@@ -3128,7 +3373,7 @@ def delete_agent_spec(self):
                 dynamic = None
             if dynamic and dynamic.active_agent_name == name:
                 cfg = ConfigService(pg_config=self.pg_config)
-                cfg.update_dynamic_config(active_agent_name=None, updated_by=data.get("client_id") or "system")
+                cfg.update_dynamic_config(active_agent_name="", updated_by=data.get("client_id") or "system")
             return jsonify({"success": True, "deleted": name}), 200
         except Exception as exc:
             logger.error(f"Error deleting agent spec: {exc}")
@@ -3575,7 +3820,7 @@ def get_chat_response(self):
             'conversation_id': conversation_id,
             'archi_msg_id': message_ids[-1],
             'server_response_msg_ts': timestamps['server_response_msg_ts'].timestamp(),
-            'model_used': model,
+            'model_used': self.current_model_used,
             'final_response_msg_ts': datetime.now(timezone.utc).timestamp(),
         }
 
diff --git a/src/interfaces/redmine_mailer_integration/redmine.py b/src/interfaces/redmine_mailer_integration/redmine.py
index 49f17aa87..3f2e665d2 100644
--- a/src/interfaces/redmine_mailer_integration/redmine.py
+++ b/src/interfaces/redmine_mailer_integration/redmine.py
@@ -43,7 +43,7 @@ def __init__(self):
         self.data_path = self.global_config["DATA_PATH"]
 
         # agent
-        agent_class = self.redmine_config.get("agent_class") or self.redmine_config.get("pipeline", "CMSCompOpsAgent")
+        agent_class = self.redmine_config.get("agent_class") or self.redmine_config.get("pipeline", "CopilotAgentPipeline")
         agents_dir = Path(
             self.redmine_config.get("agents_dir")
             or self.services_config.get("chat_app", {}).get("agents_dir", "/root/archi/agents")
diff --git a/src/utils/sql.py b/src/utils/sql.py
index 166644488..cbb6663e8 100644
--- a/src/utils/sql.py
+++ b/src/utils/sql.py
@@ -126,6 +126,12 @@
 WHERE conversation_id = %s AND client_id = %s;
 """
 
+SQL_GET_CONVERSATION_PIPELINE_SESSION_ID = """
+SELECT pipeline_session_id
+FROM conversation_metadata
+WHERE conversation_id = %s AND client_id = %s;
+"""
+
 SQL_DELETE_CONVERSATION = """
 DELETE FROM conversation_metadata
 WHERE conversation_id = %s AND client_id = %s;
@@ -148,6 +154,12 @@
 WHERE conversation_id = %s AND (user_id = %s OR client_id = %s);
 """
 
+SQL_GET_CONVERSATION_PIPELINE_SESSION_ID_BY_USER = """
+SELECT pipeline_session_id
+FROM conversation_metadata
+WHERE conversation_id = %s AND (user_id = %s OR client_id = %s);
+"""
+
 SQL_DELETE_CONVERSATION_BY_USER = """
 DELETE FROM conversation_metadata
 WHERE conversation_id = %s AND (user_id = %s OR client_id = %s);
@@ -159,6 +171,18 @@
 WHERE conversation_id = %s AND (user_id = %s OR client_id = %s);
 """
 
+SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID = """
+UPDATE conversation_metadata
+SET pipeline_session_id = %s
+WHERE conversation_id = %s AND client_id = %s;
+"""
+
+SQL_UPDATE_CONVERSATION_PIPELINE_SESSION_ID_BY_USER = """
+UPDATE conversation_metadata
+SET pipeline_session_id = %s
+WHERE conversation_id = %s AND (user_id = %s OR client_id = %s);
+"""
+
 # =============================================================================
 # Tool Calls Queries
 # =============================================================================
diff --git a/tests/_parse_stream.py b/tests/_parse_stream.py
new file mode 100644
index 000000000..ba4b4ee40
--- /dev/null
+++ b/tests/_parse_stream.py
@@ -0,0 +1,32 @@
+"""Parse NDJSON stream output and summarize events."""
+import sys, json
+
+data = open(sys.argv[1]).read().strip()
+# Find where the actual NDJSON starts (skip command echo)
+lines = data.split('\n')
+for line in lines:
+    line = line.strip()
+    if not line:
+        continue
+    try:
+        obj = json.loads(line)
+    except json.JSONDecodeError:
+        continue
+    t = obj.get('type', '')
+    if t == 'tool_start':
+        print(f"TOOL_START: {obj.get('tool_name','?')}")
+    elif t == 'tool_output':
+        out = obj.get('output', '')
+        print(f"TOOL_OUTPUT: ({len(out)} chars) {out[:120]}...")
+    elif t == 'final':
+        resp = obj.get('response', '')
+        print(f"FINAL (len={len(resp)}): {resp[:300]}...")
+        print(f"  usage: {obj.get('usage')}")
+        print(f"  model: {obj.get('model_used')}")
+        print(f"  conversation_id: {obj.get('conversation_id')}")
+    elif t == 'chunk':
+        pass
+    elif t == 'meta':
+        pass
+    else:
+        print(f"{t}: {str(obj)[:150]}")
diff --git a/tests/pr_preview_config/pr_preview_copilot_config.yaml b/tests/pr_preview_config/pr_preview_copilot_config.yaml
new file mode 100644
index 000000000..41a43a5ca
--- /dev/null
+++ b/tests/pr_preview_config/pr_preview_copilot_config.yaml
@@ -0,0 +1,48 @@
+name: pr_preview_copilot_config
+
+services:
+  postgres:
+    port: 3456
+  chat_app:
+    agent_class: CopilotAgentPipeline
+    agents_dir: tests/pr_preview_config/agents
+    default_provider: local
+    default_model: qwen3:4b
+    providers:
+      local:
+        enabled: true
+        base_url: http://localhost:11434  # Ollama endpoint with scheme
+        mode: ollama
+        default_model: "qwen3:4b"
+        models:
+          - "qwen3:4b"
+    port: 2786
+    external_port: 2786
+    trained_on: "Dummy data for preview"
+  # Vector storage uses PostgreSQL with pgvector (only supported backend)
+  data_manager:
+    port: 4242
+    external_port: 4242
+    auth:
+      enabled: false
+
+data_manager:
+  embedding_name: HuggingFaceEmbeddings
+  embedding_class_map:
+    HuggingFaceEmbeddings:
+      class: HuggingFaceEmbeddings
+      kwargs:
+        model_name: sentence-transformers/all-MiniLM-L6-v2
+        model_kwargs:
+          device: cpu
+        encode_kwargs:
+          normalize_embeddings: true
+      similarity_score_reference: 10
+  # PostgreSQL with pgvector is the only supported vector backend
+  use_hybrid_search: true
+  sources:
+    links:
+      enabled: false
+    local_files:
+      paths:
+        - tests/smoke/seed.txt
diff --git a/tests/smoke/combined_smoke.sh b/tests/smoke/combined_smoke.sh
index aef9cec7c..b19c8c2c2 100755
--- a/tests/smoke/combined_smoke.sh
+++ b/tests/smoke/combined_smoke.sh
@@ -53,6 +53,17 @@ if [[ -n "${ollama_host}" ]]; then
     exit 1
   fi
 fi
+
+info "Running deployment preflight checks..."
+"${tool}" exec -i -w /root/archi \
+  -e DM_BASE_URL="${DM_BASE_URL}" \
+  -e OLLAMA_URL="${OLLAMA_URL}" \
+  -e OLLAMA_MODEL="${OLLAMA_MODEL}" \
+  "${container_name}" \
+  python3 - < tests/smoke/deploy_preflight.py || {
+    echo "[combined-smoke] WARNING: deploy_preflight checks had failures (continuing)" >&2
+  }
+
 "${tool}" exec -i -w /root/archi \
   -e ARCHI_CONFIG_NAME="${config_name}" \
   -e ARCHI_CONFIG_PATH="/root/archi/configs/${config_name}.yaml" \
diff --git a/tests/smoke/deploy_preflight.py b/tests/smoke/deploy_preflight.py
new file mode 100644
index 000000000..1dbf6042c
--- /dev/null
+++ b/tests/smoke/deploy_preflight.py
@@ -0,0 +1,295 @@
+#!/usr/bin/env python3
+"""Deployment preflight checks for the A2rchi container environment.
+
+Validates that critical configuration and services are properly set up
+before running smoke tests.  Run inside the container after deployment.
+
+Usage:
+    python3 tests/smoke/deploy_preflight.py
+
+Environment:
+    DM_BASE_URL — data-manager base URL (default: http://localhost:7871)
+    OLLAMA_URL  — Ollama API URL (default: http://localhost:11434)
+
+Exit codes:
+    0 — all checks passed
+    1 — one or more checks failed
+"""
+
+import json
+import os
+import sys
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+_PASS = "\033[92mPASS\033[0m"
+_FAIL = "\033[91mFAIL\033[0m"
+_WARN = "\033[93mWARN\033[0m"
+
+_failures = []
+_warnings = []
+
+
+def check(name, condition, detail=""):
+    if condition:
+        print(f"  [{_PASS}] {name}")
+    else:
+        _failures.append(name)
+        detail_str = f" — {detail}" if detail else ""
+        print(f"  [{_FAIL}] {name}{detail_str}")
+
+
+def warn(name, condition, detail=""):
+    if condition:
+        print(f"  [{_PASS}] {name}")
+    else:
+        _warnings.append(name)
+        detail_str = f" — {detail}" if detail else ""
+        print(f"  [{_WARN}] {name}{detail_str}")
+
+
+# ---------------------------------------------------------------------------
+# 1. Required environment variables
+# ---------------------------------------------------------------------------
+
+
+def check_env_vars():
+    print("\n[deploy-preflight] Checking environment variables...")
+
+    # Critical — deployment fails without these
+    critical_vars = [
+        "PG_PASSWORD",
+        "OLLAMA_URL",
+    ]
+    for var in critical_vars:
+        value = os.environ.get(var, "")
+        check(f"env:{var} is set", bool(value.strip()), f"'{var}' is empty or missing")
+
+    # Important for tool functionality
+    dm_api_token = os.environ.get("DM_API_TOKEN", "")
+    warn(
+        "env:DM_API_TOKEN is set",
+        bool(dm_api_token.strip()),
+        "Tools that query data-manager will fail with auth redirects",
+    )
+
+    # Check GITHUB_TOKEN if not using BYOK-only mode
+    github_token = os.environ.get("GITHUB_TOKEN", "")
+    warn(
+        "env:GITHUB_TOKEN is set (or BYOK-only)",
+        bool(github_token.strip()),
+        "SDK auth may fail if not using BYOK-only mode",
+    )
+
+
+# ---------------------------------------------------------------------------
+# 2. Catalog API reachable and authenticated
+# ---------------------------------------------------------------------------
+
+
+def check_catalog_api():
+    print("\n[deploy-preflight] Checking catalog API...")
+    dm_base = os.environ.get("DM_BASE_URL", "http://localhost:7871")
+    dm_token = os.environ.get("DM_API_TOKEN", "")
+
+    try:
+        import requests
+    except ImportError:
+        warn("requests installed", False, "cannot test catalog API")
+        return
+
+    # Health check
+    try:
+        headers = {}
+        if dm_token:
+            headers["Authorization"] = f"Bearer {dm_token}"
+        resp = requests.get(
+            f"{dm_base}/api/health",
+            headers=headers,
+            timeout=5,
+            allow_redirects=False,
+        )
+        check(
+            "catalog health returns 200 (not redirect)",
+            resp.status_code == 200,
+            f"got {resp.status_code}"
+            + (
+                f" → {resp.headers.get('Location', '?')}"
+                if resp.status_code in (301, 302, 303, 307, 308)
+                else ""
+            ),
+        )
+    except requests.ConnectionError:
+        check("catalog API reachable", False, f"connection refused at {dm_base}")
+    except requests.Timeout:
+        check("catalog API reachable", False, f"timeout at {dm_base}")
+    except Exception as exc:
+        check("catalog API reachable", False, str(exc))
+
+
+# ---------------------------------------------------------------------------
+# 3. Ollama model available
+# ---------------------------------------------------------------------------
+
+
+def check_ollama():
+    print("\n[deploy-preflight] Checking Ollama...")
+    ollama_url = os.environ.get("OLLAMA_URL", "http://localhost:11434")
+    expected_model = os.environ.get("OLLAMA_MODEL", "")
+
+    try:
+        import requests
+    except ImportError:
+        warn("requests installed", False, "cannot test Ollama")
+        return
+
+    try:
+        resp = requests.get(f"{ollama_url}/api/tags", timeout=5)
+        check("Ollama reachable", resp.status_code == 200, f"status={resp.status_code}")
+
+        if resp.status_code == 200 and expected_model:
+            data = resp.json()
+            models = [m.get("name", "") for m in data.get("models", [])]
+            # Match either exact name or name without tag
+            found = any(
+                expected_model == m or expected_model == m.split(":")[0] for m in models
+            )
+            check(
+                f"Ollama model '{expected_model}' available",
+                found,
+                f"available models: {', '.join(models[:5])}",
+            )
+    except requests.ConnectionError:
+        check("Ollama reachable", False, f"connection refused at {ollama_url}")
+    except requests.Timeout:
+        check("Ollama reachable", False, f"timeout at {ollama_url}")
+    except Exception as exc:
+        check("Ollama reachable", False, str(exc))
+
+
+# ---------------------------------------------------------------------------
+# 4. Vectorstore non-empty
+# ---------------------------------------------------------------------------
+
+
+def check_vectorstore():
+    print("\n[deploy-preflight] Checking vectorstore...")
+    pg_host = os.environ.get("PGHOST", os.environ.get("PG_HOST", "localhost"))
+    pg_port = os.environ.get("PGPORT", os.environ.get("PG_PORT", "5432"))
+    pg_user = os.environ.get("PGUSER", os.environ.get("PG_USER", "archi"))
+    pg_pass = os.environ.get("PGPASSWORD", os.environ.get("PG_PASSWORD", ""))
+    pg_db = os.environ.get("PGDATABASE", os.environ.get("PG_DATABASE", "archi"))
+
+    try:
+        import psycopg2
+    except ImportError:
+        warn("psycopg2 installed", False, "cannot check vectorstore")
+        return
+
+    try:
+        conn = psycopg2.connect(
+            host=pg_host,
+            port=pg_port,
+            user=pg_user,
+            password=pg_pass,
+            dbname=pg_db,
+            connect_timeout=5,
+        )
+        cur = conn.cursor()
+        # Check if the embeddings table exists and has rows
+        cur.execute("""
+            SELECT count(*)
+            FROM information_schema.tables
+            WHERE table_name = 'langchain_pg_embedding'
+        """)
+        table_exists = cur.fetchone()[0] > 0
+
+        if table_exists:
+            cur.execute("SELECT count(*) FROM langchain_pg_embedding")
+            row_count = cur.fetchone()[0]
+            check(
+                f"vectorstore has embeddings ({row_count} rows)",
+                row_count > 0,
+                "vectorstore is empty — ingestion may have failed",
+            )
+        else:
+            warn(
+                "vectorstore table exists",
+                False,
+                "langchain_pg_embedding table not found",
+            )
+
+        cur.close()
+        conn.close()
+    except psycopg2.OperationalError as exc:
+        check("vectorstore DB reachable", False, str(exc).strip().split("\n")[0])
+    except Exception as exc:
+        check("vectorstore check", False, str(exc))
+
+
+# ---------------------------------------------------------------------------
+# 5. Code version sentinel
+# ---------------------------------------------------------------------------
+
+
+def check_code_version():
+    print("\n[deploy-preflight] Checking code version...")
+    # Verify the adapter module has poll_timeout parameter (post-fix sentinel)
+    try:
+        import inspect
+
+        from src.archi.pipelines.copilot_agents.copilot_event_adapter import CopilotEventAdapter
+
+        sig = inspect.signature(CopilotEventAdapter.iter_outputs)
+        has_poll_timeout = "poll_timeout" in sig.parameters
+        check(
+            "CopilotEventAdapter.iter_outputs has poll_timeout param",
+            has_poll_timeout,
+            "stale code — missing poll_timeout fix",
+        )
+    except ImportError:
+        warn("copilot_event_adapter importable", False, "module not found")
+    except Exception as exc:
+        warn("code version check", False, str(exc))
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+
+def main():
+    print("=" * 60)
+    print("[deploy-preflight] Running deployment validation checks")
+    print("=" * 60)
+
+    check_env_vars()
+    check_catalog_api()
+    check_ollama()
+    check_vectorstore()
+    check_code_version()
+
+    print()
+    if _failures:
+        print(f"[deploy-preflight] {len(_failures)} FAILED check(s):")
+        for f in _failures:
+            print(f"  - {f}")
+    if _warnings:
+        print(f"[deploy-preflight] {len(_warnings)} WARNING(s):")
+        for w in _warnings:
+            print(f"  - {w}")
+
+    if _failures:
+        print(
+            f"\n[deploy-preflight] FAILED — {len(_failures)} critical check(s) did not pass"
+        )
+        sys.exit(1)
+    else:
+        print("\n[deploy-preflight] PASSED — all critical checks OK")
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/smoke/init-test.sql b/tests/smoke/init-test.sql
index 9d9401bd5..8784ea577 100644
--- a/tests/smoke/init-test.sql
+++ b/tests/smoke/init-test.sql
@@ -341,9 +341,13 @@ CREATE TABLE IF NOT EXISTS conversation_metadata (
     title TEXT,
     created_at TIMESTAMP NOT NULL DEFAULT NOW(),
     last_message_at TIMESTAMP NOT NULL DEFAULT NOW(),
-    archi_version VARCHAR(50)
+    archi_version VARCHAR(50),
+    pipeline_session_id TEXT
 );
 
+ALTER TABLE conversation_metadata
+ADD COLUMN IF NOT EXISTS pipeline_session_id TEXT;
+
 CREATE INDEX IF NOT EXISTS idx_conv_meta_user ON conversation_metadata(user_id);
 CREATE INDEX IF NOT EXISTS idx_conv_meta_client ON conversation_metadata(client_id);
 
@@ -550,4 +554,4 @@ TO grafana;
 -- ============================================================================
 -- 
 -- Grafana queries use model_used and pipeline_used columns directly:
--- SELECT c.*, c.model_used, c.pipeline_used FROM conversations c
\ No newline at end of file
+-- SELECT c.*, c.model_used, c.pipeline_used FROM conversations c
diff --git a/tests/smoke/react_smoke.py b/tests/smoke/react_smoke.py
index df21651da..1d9605f5d 100644
--- a/tests/smoke/react_smoke.py
+++ b/tests/smoke/react_smoke.py
@@ -1,13 +1,16 @@
 #!/usr/bin/env python3
 """ReAct smoke check using chat streaming endpoint."""
+
 import json
 import os
 import sys
 import time
 import uuid
-import requests
 from typing import Tuple
 
+import requests
+
+
 def _fail(message: str) -> None:
     print(f"[react-smoke] ERROR: {message}", file=sys.stderr)
     sys.exit(1)
@@ -21,6 +24,10 @@ def _stream_chat(base_url: str, payload: dict) -> bool:
     stream_url = f"{base_url}/api/get_chat_response_stream"
     _info(f"POST {stream_url}")
     final_seen = False
+    text_events = 0
+    tool_start_events = 0
+    usage_data = None
+    final_content = ""
     try:
         with requests.post(stream_url, json=payload, stream=True, timeout=300) as resp:
             if resp.status_code != 200:
@@ -35,11 +42,40 @@ def _stream_chat(base_url: str, payload: dict) -> bool:
                 event_type = event.get("type")
                 if event_type == "error":
                     _fail(f"Stream error: {event}")
+                if event_type == "chunk":
+                    text_events += 1
+                if event_type == "tool_start":
+                    tool_start_events += 1
+                    _info(f"  tool_start: {event.get('tool_name', '?')}")
                 if event_type == "final":
                     final_seen = True
+                    final_content = event.get("response", "")
+                    usage_data = event.get("usage")
                     break
     except Exception as exc:
         _fail(f"Stream request failed: {exc}")
+
+    # Extended validation
+    if text_events == 0:
+        _info("WARNING: No text chunk events before final")
+    else:
+        _info(f"  {text_events} text chunk(s) received")
+
+    if tool_start_events > 0:
+        _info(f"  {tool_start_events} tool call(s) observed")
+
+    if not final_content:
+        _info("WARNING: final event has empty response content")
+
+    if usage_data:
+        pt = usage_data.get("prompt_tokens", 0)
+        ct = usage_data.get("completion_tokens", 0)
+        _info(f"  usage: {pt} prompt + {ct} completion = {pt + ct} total tokens")
+        if pt == 0:
+            _info("WARNING: prompt_tokens is 0 in usage data")
+    else:
+        _info("WARNING: No usage data in final event")
+
     return final_seen
 
 
diff --git a/tests/smoke/stream_test.py b/tests/smoke/stream_test.py
new file mode 100644
index 000000000..3580992d8
--- /dev/null
+++ b/tests/smoke/stream_test.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python3
+"""Test streaming events from the chat API."""
+
+import json
+import urllib.request
+
+url = "http://localhost:2786/api/get_chat_response_stream"
+payload = json.dumps(
+    {
+        "last_message": "What is the marker text in the seed files?",
+        "conversation_id": 1,
+        "config_name": "pr_preview_config",
+        "client_id": "61ca7f61-678d-4a19-857e-d6ff38ecbeb1",
+        "include_agent_steps": True,
+        "include_tool_steps": True,
+    }
+).encode()
+
+req = urllib.request.Request(
+    url, data=payload, headers={"Content-Type": "application/json"}
+)
+events = []
+
+with urllib.request.urlopen(req, timeout=120) as resp:
+    for line in resp:
+        line = line.decode().strip()
+        if not line:
+            continue
+        try:
+            ev = json.loads(line)
+            events.append(ev)
+        except json.JSONDecodeError:
+            pass
+
+print("=== EVENT SEQUENCE ===")
+builtin_tools = {"bash", "edit", "read_file", "grep", "task"}
+tools_seen = set()
+
+for i, ev in enumerate(events):
+    t = ev.get("type", "?")
+    extra = ""
+    if t == "meta":
+        extra = f" trace_id={ev.get('trace_id', '?')}"
+    elif t == "tool_start":
+        name = ev.get("tool_name", "?")
+        tools_seen.add(name)
+        extra = f" tool={name} args={json.dumps(ev.get('tool_args', {}))}"
+    elif t == "tool_end":
+        name = ev.get("tool_name", "?")
+        tools_seen.add(name)
+        extra = f" tool={name}"
+    elif t == "tool_output":
+        output_len = len(ev.get("output", ""))
+        extra = f" output_len={output_len} truncated={ev.get('truncated', '?')}"
+    elif t == "text":
+        content = ev.get("content", "")
+        extra = f" len={len(content)}"
+        if len(content) > 50:
+            extra += f" preview={content[:50]}..."
+    elif t == "final":
+        extra = f" msg_id={ev.get('archi_msg_id', '?')} conv_id={ev.get('conversation_id', '?')}"
+    elif t == "usage":
+        extra = f" prompt={ev.get('prompt_tokens')} completion={ev.get('completion_tokens')} total={ev.get('total_tokens')}"
+    elif t == "thinking_start":
+        extra = f" step_id={ev.get('step_id', '?')}"
+    elif t == "thinking_end":
+        extra = f" step_id={ev.get('step_id', '?')}"
+
+    # Only print non-text events (too many text events)
+    if t != "text":
+        print(f"  [{i:3d}] {t}{extra}")
+    if t == "error":
+        print(f"       ERROR DETAIL: {json.dumps(ev)}")
+
+# Summary
+print()
+text_count = sum(1 for e in events if e["type"] == "text")
+print(
+    f"Total events: {len(events)} ({text_count} text tokens + {len(events) - text_count} control events)"
+)
+print(f"Tools seen: {sorted(tools_seen)}")
+print(
+    f"Built-in tools detected: {tools_seen & builtin_tools if tools_seen & builtin_tools else 'NONE'}"
+)
+
+# Check for required event types
+required = {"meta", "thinking_start", "text", "usage", "final"}
+seen_types = {e["type"] for e in events}
+missing = required - seen_types
+print(f"Required event types present: {required & seen_types}")
+if missing:
+    print(f"MISSING required events: {missing}")
+else:
+    print("All required event types present: PASS")
+
+# If tools were used, check for tool events
+if tools_seen:
+    tool_types = {"tool_start", "tool_output"} & seen_types
+    print(f"Tool event types: {tool_types}")
diff --git a/tests/smoke/tools_smoke.py b/tests/smoke/tools_smoke.py
index 491af7442..c4ee1755e 100644
--- a/tests/smoke/tools_smoke.py
+++ b/tests/smoke/tools_smoke.py
@@ -1,18 +1,22 @@
 #!/usr/bin/env python3
-"""Direct tool smoke checks for catalog and vectorstore tools."""
+"""Direct tool smoke checks for catalog and vectorstore tools.
+
+Updated for Copilot SDK: tools are now @define_tool-decorated async
+functions. We test the underlying catalog/retriever operations directly
+and verify that the tool factories produce callable objects.
+"""
+
+import asyncio
 import os
 import sys
 from typing import Dict
 
 import yaml
 
-from src.archi.pipelines.agents.tools import (
-    RemoteCatalogClient,
-    create_document_fetch_tool,
-    create_file_search_tool,
-    create_metadata_search_tool,
-    create_retriever_tool,
-)
+from src.archi.pipelines.agents.tools import RemoteCatalogClient
+from src.archi.pipelines.copilot_agents.tools import (TOOL_REGISTRY, DocumentCollector,
+                             build_document_fetch_tool, build_file_search_tool,
+                             build_metadata_search_tool, build_retriever_tool)
 from src.archi.utils.vectorstore_connector import VectorstoreConnector
 from src.data_manager.vectorstore.retrievers import HybridRetriever
 
@@ -26,14 +30,6 @@ def _info(message: str) -> None:
     print(f"[tools-smoke] {message}")
 
 
-def _invoke_tool(tool, payload: Dict[str, object]) -> str:
-    if hasattr(tool, "invoke"):
-        return tool.invoke(payload)
-    if hasattr(tool, "run"):
-        return tool.run(payload)
-    raise TypeError(f"Unsupported tool type: {type(tool)}")
-
-
 def _load_config() -> Dict:
     config_path = os.getenv("ARCHI_CONFIG_PATH")
     if not config_path:
@@ -75,39 +71,54 @@ def _run_catalog_tools(catalog: RemoteCatalogClient) -> None:
     file_query = os.getenv("FILE_SEARCH_QUERY", "Smoke test seed document")
     metadata_query = os.getenv("METADATA_SEARCH_QUERY", "file_name:seed.txt")
 
-    file_search_tool = create_file_search_tool(catalog)
-    metadata_search_tool = create_metadata_search_tool(catalog)
-    fetch_tool = create_document_fetch_tool(catalog)
-
-    _info("Running file search tool ...")
-    file_result = _invoke_tool(file_search_tool, {"query": file_query})
-    if "failed" in file_result.lower() or "no local files" in file_result.lower():
-        _fail("File search tool returned no results or failed")
-
-    _info("Running metadata search tool ...")
-    meta_result = _invoke_tool(metadata_search_tool, {"query": metadata_query})
-    if "failed" in meta_result.lower() or "no local files" in meta_result.lower():
-        _fail("Metadata search tool returned no results or failed")
-
-    _info("Running document fetch tool ...")
-    hits = catalog.search(metadata_query, limit=1, search_content=False)
-    if not hits:
-        _fail("Metadata search returned no hits; cannot fetch document")
-    resource_hash = hits[0].get("hash")
+    # Verify tool factories produce Tool objects
+    collector = DocumentCollector()
+    file_search_tool = build_file_search_tool(catalog, store_docs=collector.store_docs)
+    metadata_search_tool = build_metadata_search_tool(
+        catalog, store_docs=collector.store_docs
+    )
+    fetch_tool = build_document_fetch_tool(catalog)
+    assert file_search_tool is not None, "build_file_search_tool returned None"
+    assert metadata_search_tool is not None, "build_metadata_search_tool returned None"
+    assert fetch_tool is not None, "build_document_fetch_tool returned None"
+    _info("Tool factories produce Tool objects ✓")
+
+    # Test underlying catalog operations directly
+    _info("Running catalog file search ...")
+    file_results = catalog.search(file_query, limit=3, search_content=True)
+    file_hits = list(file_results)
+    if not file_hits:
+        _fail("Catalog file search returned no results")
+    _info(f"  Found {len(file_hits)} file(s)")
+
+    _info("Running catalog metadata search ...")
+    meta_results = catalog.search(metadata_query, limit=3, search_content=False)
+    meta_hits = list(meta_results)
+    if not meta_hits:
+        _fail("Catalog metadata search returned no results")
+    _info(f"  Found {len(meta_hits)} metadata hit(s)")
+
+    _info("Running document fetch ...")
+    resource_hash = meta_hits[0].get("hash")
     if not resource_hash:
         _fail("Catalog hit missing resource hash")
-    fetch_result = _invoke_tool(fetch_tool, {"resource_hash": resource_hash})
-    if "content:" not in fetch_result.lower():
-        _fail("Document fetch tool returned unexpected output")
+    doc = catalog.get_document(resource_hash, max_chars=4000)
+    if not doc or not doc.get("text"):
+        _fail("Document fetch returned empty content")
+    _info(f"  Fetched document ({len(doc['text'])} chars)")
 
 
 def _run_vectorstore_tool(config: Dict) -> None:
     _map_embedding_classes(config)
     vectorstore = VectorstoreConnector(config).get_vectorstore()
 
-    retriever_cfg = config.get("data_manager", {}).get("retrievers", {}).get("hybrid_retriever")
+    retriever_cfg = (
+        config.get("data_manager", {}).get("retrievers", {}).get("hybrid_retriever")
+    )
     if not retriever_cfg:
-        _fail("Missing data_manager.retrievers.hybrid_retriever config for vectorstore tool")
+        _fail(
+            "Missing data_manager.retrievers.hybrid_retriever config for vectorstore tool"
+        )
 
     hybrid_retriever = HybridRetriever(
         vectorstore=vectorstore,
@@ -116,17 +127,50 @@ def _run_vectorstore_tool(config: Dict) -> None:
         semantic_weight=retriever_cfg["semantic_weight"],
     )
 
-    retriever_tool = create_retriever_tool(hybrid_retriever)
-    query = os.getenv("VECTORSTORE_QUERY", "Smoke test seed document")
+    # Verify factory produces a Tool object
+    collector = DocumentCollector()
+    retriever_tool = build_retriever_tool(hybrid_retriever, store_docs=collector.store_docs)
+    assert retriever_tool is not None, "build_retriever_tool returned None"
+    _info("Retriever tool factory produces Tool object ✓")
 
-    _info("Running vectorstore retriever tool ...")
-    result = _invoke_tool(retriever_tool, {"query": query})
-    if "no documents found" in result.lower():
-        _fail("Vectorstore retriever tool returned no documents")
+    # Test underlying retriever directly
+    query = os.getenv("VECTORSTORE_QUERY", "Smoke test seed document")
+    _info("Running vectorstore retriever ...")
+    results = hybrid_retriever.invoke(query)
+    if not results:
+        _fail("Vectorstore retriever returned no documents")
+    _info(f"  Retrieved {len(results)} document(s)")
+
+
+def _verify_tool_registry() -> None:
+    """Verify TOOL_REGISTRY is consistent and all factories are callable."""
+    _info("Verifying TOOL_REGISTRY ...")
+    expected_tools = {
+        "search_vectorstore_hybrid",
+        "search_local_files",
+        "search_metadata_index",
+        "list_metadata_schema",
+        "fetch_catalog_document",
+        "monit_opensearch_search",
+        "monit_opensearch_aggregation",
+    }
+    actual_tools = set(TOOL_REGISTRY.keys())
+    if actual_tools != expected_tools:
+        missing = expected_tools - actual_tools
+        extra = actual_tools - expected_tools
+        _fail(f"TOOL_REGISTRY mismatch. Missing: {missing}, Extra: {extra}")
+
+    for name, entry in TOOL_REGISTRY.items():
+        if not callable(entry.get("factory")):
+            _fail(f"TOOL_REGISTRY['{name}'].factory is not callable")
+        if not isinstance(entry.get("description"), str):
+            _fail(f"TOOL_REGISTRY['{name}'].description is not a string")
+    _info(f"  All {len(TOOL_REGISTRY)} tools registered correctly ✓")
 
 
 def main() -> None:
     config = _load_config()
+    _verify_tool_registry()
     catalog = _build_catalog_client(config)
     _run_catalog_tools(catalog)
     _run_vectorstore_tool(config)
diff --git a/tests/test_pipeline_matrix.py b/tests/test_pipeline_matrix.py
new file mode 100644
index 000000000..ed7e4f2ec
--- /dev/null
+++ b/tests/test_pipeline_matrix.py
@@ -0,0 +1,505 @@
+"""
+Pipeline Feature Test Matrix
+=============================
+Tests both CMSCompOpsAgent (LangGraph) and CopilotAgentPipeline (Copilot SDK)
+against the same feature matrix to verify interface parity.
+
+Usage:  python tests/test_pipeline_matrix.py
+"""
+
+import sys
+import time
+import traceback
+from typing import Any, Dict, List, Optional, Tuple
+
+# ---------------------------------------------------------------------------
+# Shared config for both pipelines (Ollama on submit76)
+# ---------------------------------------------------------------------------
+SHARED_CONFIG: Dict[str, Any] = {
+    "archi": {},
+    "services": {
+        "chat_app": {
+            "default_provider": "local",
+            "default_model": "qwen3:32b",
+            "providers": {
+                "local": {
+                    "enabled": True,
+                    "base_url": "http://submit76.mit.edu:7870",
+                    "mode": "ollama",
+                    "default_model": "qwen3:32b",
+                    "models": ["qwen3:32b"],
+                },
+            },
+        },
+    },
+    "data_manager": {},
+}
+
+CONSTRUCTOR_KWARGS = {
+    "default_provider": "local",
+    "default_model": "qwen3:32b",
+}
+
+
+# ---------------------------------------------------------------------------
+# Test result tracking
+# ---------------------------------------------------------------------------
+class TestResult:
+    def __init__(self, name: str, pipeline: str):
+        self.name = name
+        self.pipeline = pipeline
+        self.passed = False
+        self.error: Optional[str] = None
+        self.details: Optional[str] = None
+
+    def ok(self, details: str = ""):
+        self.passed = True
+        self.details = details
+        return self
+
+    def fail(self, error: str):
+        self.passed = False
+        self.error = error
+        return self
+
+
+results: List[TestResult] = []
+
+
+def test(name: str, pipeline: str) -> TestResult:
+    r = TestResult(name, pipeline)
+    results.append(r)
+    return r
+
+
+# ---------------------------------------------------------------------------
+# Import both pipeline classes
+# ---------------------------------------------------------------------------
+print("=" * 70)
+print("PIPELINE FEATURE TEST MATRIX")
+print("=" * 70)
+
+print("\nImporting pipeline classes...")
+try:
+    from src.archi.pipelines.agents.cms_comp_ops_agent import CMSCompOpsAgent
+
+    print("  CMSCompOpsAgent: OK")
+except Exception as e:
+    print(f"  CMSCompOpsAgent: FAILED - {e}")
+    sys.exit(1)
+
+try:
+    from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+    print("  CopilotAgentPipeline: OK")
+except Exception as e:
+    print(f"  CopilotAgentPipeline: FAILED - {e}")
+    sys.exit(1)
+
+from src.archi.utils.output_dataclass import PipelineOutput
+
+# ===================================================================
+# TEST MATRIX
+# ===================================================================
+
+PIPELINES: Dict[str, Any] = {}
+
+# ----- T1: Instantiation -----
+print("\n--- T1: Instantiation ---")
+for label, cls in [("LangGraph", CMSCompOpsAgent), ("Copilot", CopilotAgentPipeline)]:
+    r = test("T1: Instantiation", label)
+    try:
+        agent = cls(config=SHARED_CONFIG, **CONSTRUCTOR_KWARGS)
+        PIPELINES[label] = agent
+        r.ok(f"type={type(agent).__name__}")
+        print(f"  [{label}] PASS — {r.details}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T2: Has required methods -----
+print("\n--- T2: Required methods ---")
+REQUIRED_METHODS = [
+    "invoke",
+    "stream",
+    "astream",
+    "get_tool_registry",
+    "get_tool_descriptions",
+]
+for label, agent in PIPELINES.items():
+    for method in REQUIRED_METHODS:
+        r = test(f"T2: has {method}()", label)
+        if callable(getattr(agent, method, None)):
+            r.ok()
+            print(f"  [{label}] {method}(): PASS")
+        else:
+            r.fail(f"missing or not callable")
+            print(f"  [{label}] {method}(): FAIL")
+
+# ----- T3: get_tool_registry() returns valid mapping -----
+print("\n--- T3: Tool registry ---")
+for label, agent in PIPELINES.items():
+    r = test("T3: get_tool_registry()", label)
+    try:
+        registry = agent.get_tool_registry()
+        if not isinstance(registry, dict):
+            r.fail(f"returned {type(registry).__name__}, expected dict")
+        elif len(registry) == 0:
+            r.fail("empty registry")
+        else:
+            tool_names = sorted(registry.keys())
+            r.ok(f"{len(registry)} tools: {tool_names}")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+
+# ----- T4: get_tool_descriptions() returns valid mapping -----
+print("\n--- T4: Tool descriptions ---")
+for label, agent in PIPELINES.items():
+    r = test("T4: get_tool_descriptions()", label)
+    try:
+        descs = agent.get_tool_descriptions()
+        if not isinstance(descs, dict):
+            r.fail(f"returned {type(descs).__name__}, expected dict")
+        elif len(descs) == 0:
+            r.fail("empty descriptions")
+        else:
+            all_have_desc = all(
+                isinstance(v, str) and len(v) > 0 for v in descs.values()
+            )
+            r.ok(f"{len(descs)} descriptions, all non-empty={all_have_desc}")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+
+# ----- T5: stream() — basic text response -----
+print("\n--- T5: stream() basic text ---")
+for label, agent in PIPELINES.items():
+    r = test("T5: stream() basic text", label)
+    try:
+        history = [("user", "What is 2+2? Answer only with the number.")]
+        outputs = list(agent.stream(history=history))
+        if not outputs:
+            r.fail("no outputs")
+        else:
+            event_types = [
+                o.metadata.get("event_type", "?") if o.metadata else "?"
+                for o in outputs
+            ]
+            final = outputs[-1]
+            has_final = "final" in event_types
+            has_answer = bool(final.answer and final.answer.strip())
+            if not has_final:
+                r.fail(f"no 'final' event. Got: {event_types}")
+            elif not has_answer:
+                r.fail(f"empty final answer")
+            else:
+                r.ok(
+                    f"{len(outputs)} events, types={event_types}, "
+                    f"answer='{final.answer[:60]}'"
+                )
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T6: stream() returns PipelineOutput instances -----
+print("\n--- T6: All outputs are PipelineOutput ---")
+for label, agent in PIPELINES.items():
+    r = test("T6: PipelineOutput type check", label)
+    try:
+        history = [("user", "Say hello")]
+        outputs = list(agent.stream(history=history))
+        non_po = [
+            type(o).__name__ for o in outputs if not isinstance(o, PipelineOutput)
+        ]
+        if non_po:
+            r.fail(f"non-PipelineOutput types: {non_po}")
+        else:
+            r.ok(f"all {len(outputs)} outputs are PipelineOutput")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T7: Event type sequence -----
+print("\n--- T7: Event type sequence ---")
+for label, agent in PIPELINES.items():
+    r = test("T7: Event types present", label)
+    try:
+        history = [("user", "What is the capital of France? Be brief.")]
+        outputs = list(agent.stream(history=history))
+        event_types = [
+            o.metadata.get("event_type", "?") if o.metadata else "?" for o in outputs
+        ]
+        has_text = "text" in event_types
+        has_final = "final" in event_types
+        has_thinking = "thinking_start" in event_types or "thinking_end" in event_types
+        if not has_final:
+            r.fail(f"missing 'final' event. Got: {event_types}")
+        elif not has_text:
+            r.fail(f"missing 'text' event. Got: {event_types}")
+        else:
+            r.ok(
+                f"text={has_text}, final={has_final}, thinking={has_thinking}, all={event_types}"
+            )
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T8: Final output has usage metadata -----
+print("\n--- T8: Final output metadata ---")
+for label, agent in PIPELINES.items():
+    r = test("T8: Final has usage/model", label)
+    try:
+        history = [("user", "Say OK")]
+        outputs = list(agent.stream(history=history))
+        finals = [
+            o for o in outputs if o.metadata and o.metadata.get("event_type") == "final"
+        ]
+        if not finals:
+            r.fail("no final event")
+        else:
+            f = finals[-1]
+            usage = f.metadata.get("usage")
+            model = f.metadata.get("model")
+            has_usage = usage is not None
+            has_model = model is not None
+            r.ok(f"usage={usage}, model={model}")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T9: invoke() returns single PipelineOutput -----
+print("\n--- T9: invoke() ---")
+for label, agent in PIPELINES.items():
+    r = test("T9: invoke()", label)
+    try:
+        history = [("user", "What is 3+3? Just the number.")]
+        result = agent.invoke(history=history)
+        if not isinstance(result, PipelineOutput):
+            r.fail(f"returned {type(result).__name__}")
+        elif not result.answer or not result.answer.strip():
+            r.fail("empty answer")
+        else:
+            r.ok(f"answer='{result.answer[:60]}'")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T10: conversation_id is accepted -----
+print("\n--- T10: conversation_id kwarg ---")
+for label, agent in PIPELINES.items():
+    r = test("T10: conversation_id kwarg", label)
+    try:
+        history = [("user", "Say yes")]
+        outputs = list(agent.stream(history=history, conversation_id=12345))
+        if not outputs:
+            r.fail("no outputs")
+        else:
+            final = outputs[-1]
+            r.ok(f"accepted conversation_id=12345, answer='{final.answer[:40]}'")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T11: Multi-turn history -----
+print("\n--- T11: Multi-turn history ---")
+for label, agent in PIPELINES.items():
+    r = test("T11: Multi-turn history", label)
+    try:
+        history = [
+            ("user", "My name is Alice."),
+            ("assistant", "Hello Alice! How can I help you?"),
+            ("user", "What is my name? Just say the name."),
+        ]
+        outputs = list(agent.stream(history=history))
+        final = outputs[-1] if outputs else None
+        if not final or not final.answer:
+            r.fail("no final answer")
+        elif "alice" in final.answer.lower():
+            r.ok(f"correctly recalled 'Alice' — answer='{final.answer[:60]}'")
+        else:
+            r.fail(f"did not recall 'Alice' — answer='{final.answer[:60]}'")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T12: agent_spec support -----
+print("\n--- T12: agent_spec support ---")
+for label, cls in [("LangGraph", CMSCompOpsAgent), ("Copilot", CopilotAgentPipeline)]:
+    r = test("T12: agent_spec", label)
+    try:
+
+        class FakeAgentSpec:
+            name = "TestAgent"
+            prompt = "You are a pirate. Always respond in pirate speak."
+            tools = []
+
+        agent_with_spec = cls(
+            config=SHARED_CONFIG,
+            agent_spec=FakeAgentSpec(),
+            **CONSTRUCTOR_KWARGS,
+        )
+        history = [("user", "Hello, who are you?")]
+        outputs = list(agent_with_spec.stream(history=history))
+        final = outputs[-1] if outputs else None
+        if not final or not final.answer:
+            r.fail("no final answer")
+        else:
+            answer_lower = final.answer.lower()
+            pirate_words = [
+                "ahoy",
+                "matey",
+                "arr",
+                "ye",
+                "pirate",
+                "captain",
+                "sea",
+                "ship",
+                "sail",
+                "treasure",
+            ]
+            has_pirate = any(w in answer_lower for w in pirate_words)
+            if has_pirate:
+                r.ok(f"pirate_words={has_pirate}, answer='{final.answer[:80]}'")
+            else:
+                r.fail(
+                    f"agent_spec prompt not reflected — answer='{final.answer[:80]}'"
+                )
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T13: Accumulated text contract -----
+print("\n--- T13: Text events have content ---")
+for label, agent in PIPELINES.items():
+    r = test("T13: Text events have content", label)
+    try:
+        history = [("user", "Count from 1 to 5.")]
+        outputs = list(agent.stream(history=history))
+        text_events = [
+            o for o in outputs if o.metadata and o.metadata.get("event_type") == "text"
+        ]
+        if not text_events:
+            r.fail("no text events emitted")
+        else:
+            non_empty = [o for o in text_events if o.answer and o.answer.strip()]
+            r.ok(f"{len(text_events)} text events, {len(non_empty)} with content")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T14: Final event is last -----
+print("\n--- T14: Final event is last ---")
+for label, agent in PIPELINES.items():
+    r = test("T14: Final event is last", label)
+    try:
+        history = [("user", "Say done.")]
+        outputs = list(agent.stream(history=history))
+        if not outputs:
+            r.fail("no outputs")
+        else:
+            last = outputs[-1]
+            is_final = last.metadata and last.metadata.get("event_type") == "final"
+            if not is_final:
+                last_type = (
+                    last.metadata.get("event_type") if last.metadata else "no metadata"
+                )
+                r.fail(f"last event is '{last_type}', not 'final'")
+            else:
+                r.ok(f"final is last of {len(outputs)} events")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+# ----- T15: Empty extra kwargs don't crash -----
+print("\n--- T15: Extra kwargs tolerance ---")
+for label, agent in PIPELINES.items():
+    r = test("T15: Extra kwargs tolerance", label)
+    try:
+        history = [("user", "Say OK")]
+        outputs = list(
+            agent.stream(
+                history=history,
+                conversation_id=None,
+                vectorstore=None,
+                user_id=None,
+            )
+        )
+        final = outputs[-1] if outputs else None
+        if not final or not final.answer:
+            r.fail("no answer with extra None kwargs")
+        else:
+            r.ok(f"accepted None kwargs, answer='{final.answer[:40]}'")
+        print(f"  [{label}] {r.passed and 'PASS' or 'FAIL'} — {r.details or r.error}")
+    except Exception as e:
+        r.fail(str(e))
+        print(f"  [{label}] FAIL — {e}")
+        traceback.print_exc()
+
+
+# ===================================================================
+# RESULTS SUMMARY
+# ===================================================================
+print("\n")
+print("=" * 70)
+print("RESULTS SUMMARY")
+print("=" * 70)
+
+# Group by test name
+from collections import OrderedDict
+
+matrix: OrderedDict[str, Dict[str, TestResult]] = OrderedDict()
+for r in results:
+    if r.name not in matrix:
+        matrix[r.name] = {}
+    matrix[r.name][r.pipeline] = r
+
+print(f"\n{'Test':<40} {'LangGraph':<12} {'Copilot':<12}")
+print("-" * 64)
+for test_name, by_pipeline in matrix.items():
+    lg = by_pipeline.get("LangGraph")
+    cp = by_pipeline.get("Copilot")
+    lg_str = "PASS" if lg and lg.passed else "FAIL" if lg else "SKIP"
+    cp_str = "PASS" if cp and cp.passed else "FAIL" if cp else "SKIP"
+    print(f"{test_name:<40} {lg_str:<12} {cp_str:<12}")
+
+total = len(results)
+passed = sum(1 for r in results if r.passed)
+failed = sum(1 for r in results if not r.passed)
+
+print(f"\n{'Total':<40} {total}")
+print(f"{'Passed':<40} {passed}")
+print(f"{'Failed':<40} {failed}")
+
+# Print failures
+if failed:
+    print("\n--- FAILURES ---")
+    for r in results:
+        if not r.passed:
+            print(f"  [{r.pipeline}] {r.name}: {r.error}")
+
+print()
+sys.exit(0 if failed == 0 else 1)
diff --git a/tests/ui/fixtures.ts b/tests/ui/fixtures.ts
index 669da9a7a..931c22232 100644
--- a/tests/ui/fixtures.ts
+++ b/tests/ui/fixtures.ts
@@ -70,6 +70,24 @@ export const mockData = {
       { provider: 'openai', display_name: 'OpenAI', configured: false, has_session_key: false },
     ],
   },
+
+  agents: {
+    agents: [
+      { name: 'CMS Comp Ops', filename: 'cms-comp-ops.md' },
+      { name: 'Test Agent', filename: 'test-agent.md' },
+    ],
+    active_name: 'CMS Comp Ops',
+  },
+
+  agentTemplate: {
+    name: 'New Agent',
+    tools: [
+      { name: 'search_knowledge_base', description: 'Search the knowledge base' },
+      { name: 'search_local_files', description: 'Search uploaded local files' },
+      { name: 'search_metadata_index', description: 'Search the metadata index' },
+    ],
+    template: '---\nname: New Agent\ntools:\n  - search_knowledge_base\n  - search_local_files\n  - search_metadata_index\n---\n\nWrite your system prompt here.\n\n',
+  },
 };
 
 // =============================================================================
@@ -153,6 +171,10 @@ export async function setupBasicMocks(page: Page) {
   await page.route('**/api/new_conversation', async (route) => {
     await route.fulfill({ status: 200, json: { conversation_id: null } });
   });
+
+  await page.route('**/api/agents/list', async (route) => {
+    await route.fulfill({ status: 200, json: mockData.agents });
+  });
 }
 
 export async function setupStreamMock(page: Page, response: string, delay = 0) {
diff --git a/tests/ui/workflows/21-agent-management.spec.ts b/tests/ui/workflows/21-agent-management.spec.ts
new file mode 100644
index 000000000..0c0e7705e
--- /dev/null
+++ b/tests/ui/workflows/21-agent-management.spec.ts
@@ -0,0 +1,326 @@
+/**
+ * Workflow 21: Agent Management Tests
+ *
+ * Tests for agent CRUD operations: listing, creating, editing,
+ * activating, and deleting custom agents via the agent dropdown
+ * and agent spec editor.
+ */
+import { test, expect, setupBasicMocks, mockData } from '../fixtures';
+
+test.describe('Agent Dropdown', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('agent dropdown shows active agent name', async ({ page }) => {
+    await page.goto('/chat');
+
+    const label = page.locator('.agent-dropdown-label');
+    await expect(label).toBeVisible();
+    await expect(label).toContainText('CMS Comp Ops');
+  });
+
+  test('agent dropdown lists all agents with active checkmark', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    const items = page.locator('.agent-dropdown-item');
+    await expect(items).toHaveCount(2);
+
+    // Active agent has .active class
+    await expect(items.first()).toHaveClass(/active/);
+  });
+
+  test('clicking agent item activates it', async ({ page }) => {
+    let activatePayload: any = null;
+
+    await page.route('**/api/agents/active', async (route) => {
+      activatePayload = route.request().postDataJSON();
+      await route.fulfill({
+        status: 200,
+        json: { success: true, active_name: activatePayload.name },
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    // Click on the second (non-active) agent
+    await page.locator('.agent-dropdown-item').nth(1).locator('.agent-dropdown-name').click();
+
+    // Verify API was called
+    expect(activatePayload).not.toBeNull();
+    expect(activatePayload.name).toBe('Test Agent');
+  });
+
+  test('dropdown shows edit and delete buttons per agent', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    const firstItem = page.locator('.agent-dropdown-item').first();
+    await expect(firstItem.locator('.agent-dropdown-edit')).toBeVisible();
+    await expect(firstItem.locator('.agent-dropdown-delete')).toBeVisible();
+  });
+
+  test('delete shows confirmation and completes on confirm', async ({ page }) => {
+    let deletePayload: any = null;
+
+    await page.route('**/api/agents', async (route) => {
+      if (route.request().method() === 'DELETE') {
+        deletePayload = route.request().postDataJSON();
+        await route.fulfill({ status: 200, json: { success: true, deleted: deletePayload.name } });
+      } else {
+        await route.continue();
+      }
+    });
+
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await expect(page.locator('.agent-dropdown-menu')).not.toHaveAttribute('hidden', '');
+
+    // Click delete on the second agent — replaces row with confirmation prompt.
+    // The dropdown may close on click propagation, so use dispatchEvent to
+    // perform the confirmation entirely within the dropdown's click handler.
+    await page.locator('.agent-dropdown-item').nth(1).locator('.agent-dropdown-delete').click();
+
+    // Re-open the dropdown if it closed after the delete button click
+    const menu = page.locator('.agent-dropdown-menu');
+    if (await menu.getAttribute('hidden') !== null) {
+      await page.locator('.agent-dropdown-btn').click();
+    }
+
+    // Click the "Delete" confirmation button
+    await page.locator('.agent-dropdown-confirm-yes').click({ force: true });
+
+    expect(deletePayload).not.toBeNull();
+  });
+});
+
+test.describe('Agent Spec Editor — Create', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+
+    await page.route('**/api/agents/template*', async (route) => {
+      await route.fulfill({ status: 200, json: mockData.agentTemplate });
+    });
+  });
+
+  test('add button opens agent spec editor in create mode', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    const modal = page.locator('.agent-spec-modal');
+    await expect(modal).toBeVisible();
+
+    // Title should say "New Agent"
+    const title = page.locator('#agent-spec-title');
+    await expect(title).toContainText('New Agent');
+  });
+
+  test('create mode loads template with tool palette', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    // Tool palette should list tools from template
+    const toolsList = page.locator('.agent-spec-tools-list');
+    await expect(toolsList).toBeVisible({ timeout: 3000 });
+  });
+
+  test('saving agent spec calls POST /api/agents', async ({ page }) => {
+    let savePayload: any = null;
+
+    await page.route('**/api/agents', async (route) => {
+      if (route.request().method() === 'POST') {
+        savePayload = route.request().postDataJSON();
+        await route.fulfill({
+          status: 200,
+          json: { success: true, name: 'My Custom Agent', filename: 'my-custom-agent.md' },
+        });
+      } else {
+        await route.continue();
+      }
+    });
+
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    // Fill in the name
+    const nameInput = page.locator('#agent-spec-name');
+    await nameInput.fill('My Custom Agent');
+
+    // Fill in the prompt
+    const promptInput = page.locator('#agent-spec-prompt');
+    await promptInput.fill('You are a helpful assistant.');
+
+    // Click Save
+    await page.locator('.agent-spec-save').click();
+
+    // Verify save was called
+    expect(savePayload).not.toBeNull();
+    expect(savePayload.content).toBeDefined();
+  });
+
+  test('close button closes the spec editor', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    await expect(page.locator('.agent-spec-modal')).toBeVisible();
+    await page.locator('.agent-spec-close').click();
+    await expect(page.locator('.agent-spec-modal')).not.toBeVisible();
+  });
+
+  test('Escape key closes the spec editor', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    await expect(page.locator('.agent-spec-modal')).toBeVisible();
+    await page.keyboard.press('Escape');
+    await expect(page.locator('.agent-spec-modal')).not.toBeVisible();
+  });
+});
+
+test.describe('Agent Spec Editor — Edit', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+
+    await page.route('**/api/agents/template*', async (route) => {
+      await route.fulfill({ status: 200, json: mockData.agentTemplate });
+    });
+
+    await page.route('**/api/agents/spec*', async (route) => {
+      await route.fulfill({
+        status: 200,
+        json: {
+          name: 'CMS Comp Ops',
+          filename: 'cms-comp-ops.md',
+          content: '---\nname: CMS Comp Ops\ntools:\n  - search_knowledge_base\n  - search_local_files\n---\n\nYou are a CMS Computing Operations assistant.\n\n',
+        },
+      });
+    });
+  });
+
+  test('edit button opens spec editor in edit mode', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    // Click edit on the first agent
+    await page.locator('.agent-dropdown-item').first().locator('.agent-dropdown-edit').click();
+
+    const modal = page.locator('.agent-spec-modal');
+    await expect(modal).toBeVisible();
+
+    // Title should show edit mode
+    const title = page.locator('#agent-spec-title');
+    await expect(title).toContainText('Edit');
+  });
+
+  test('edit mode loads existing agent content', async ({ page }) => {
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-item').first().locator('.agent-dropdown-edit').click();
+
+    // Name should be populated
+    const nameInput = page.locator('#agent-spec-name');
+    await expect(nameInput).toHaveValue('CMS Comp Ops', { timeout: 3000 });
+  });
+
+  test('save in edit mode sends existing_name', async ({ page }) => {
+    let savePayload: any = null;
+
+    await page.route('**/api/agents', async (route) => {
+      if (route.request().method() === 'POST') {
+        savePayload = route.request().postDataJSON();
+        await route.fulfill({
+          status: 200,
+          json: { success: true, name: 'CMS Comp Ops', filename: 'cms-comp-ops.md' },
+        });
+      } else {
+        await route.continue();
+      }
+    });
+
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-item').first().locator('.agent-dropdown-edit').click();
+
+    // Wait for content to load
+    await expect(page.locator('#agent-spec-name')).toHaveValue('CMS Comp Ops', { timeout: 3000 });
+
+    // Modify prompt
+    const promptInput = page.locator('#agent-spec-prompt');
+    await promptInput.fill('Updated prompt for CMS Comp Ops.');
+
+    // Save
+    await page.locator('.agent-spec-save').click();
+
+    expect(savePayload).not.toBeNull();
+    expect(savePayload.mode).toBe('edit');
+    expect(savePayload.existing_name).toBe('CMS Comp Ops');
+  });
+});
+
+test.describe('Agent Error Handling', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('duplicate agent name shows error', async ({ page }) => {
+    await page.route('**/api/agents/template*', async (route) => {
+      await route.fulfill({ status: 200, json: mockData.agentTemplate });
+    });
+
+    await page.route('**/api/agents', async (route) => {
+      if (route.request().method() === 'POST') {
+        await route.fulfill({
+          status: 409,
+          json: { error: "Agent name 'CMS Comp Ops' already exists" },
+        });
+      } else {
+        await route.continue();
+      }
+    });
+
+    await page.goto('/chat');
+
+    await page.locator('.agent-dropdown-btn').click();
+    await page.locator('.agent-dropdown-add').click();
+
+    const nameInput = page.locator('#agent-spec-name');
+    await nameInput.fill('CMS Comp Ops');
+
+    const promptInput = page.locator('#agent-spec-prompt');
+    await promptInput.fill('Duplicate test.');
+
+    await page.locator('.agent-spec-save').click();
+
+    // Should show an error status
+    const status = page.locator('#agent-spec-status');
+    await expect(status).toBeVisible({ timeout: 3000 });
+    await expect(status).toContainText('already exists');
+  });
+
+  test('agent list handles API failure gracefully', async ({ page }) => {
+    // Override the agents mock to return an error
+    await page.route('**/api/agents/list', async (route) => {
+      await route.fulfill({ status: 500, json: { error: 'Database error' } });
+    });
+
+    await page.goto('/chat');
+
+    // Page should still load even if agents list fails
+    await expect(page.getByLabel('Message input')).toBeVisible();
+  });
+});
diff --git a/tests/ui/workflows/22-copilot-streaming.spec.ts b/tests/ui/workflows/22-copilot-streaming.spec.ts
new file mode 100644
index 000000000..68e3fdca1
--- /dev/null
+++ b/tests/ui/workflows/22-copilot-streaming.spec.ts
@@ -0,0 +1,373 @@
+/**
+ * Workflow 22: Copilot SDK Streaming Edge Cases
+ *
+ * Tests for edge cases in the Copilot SDK agent streaming pipeline:
+ * multiple tool calls, thinking + tools combos, error events,
+ * orphan tool completions, and usage accumulation.
+ */
+import { test, expect, setupBasicMocks, createToolCallEvents } from '../fixtures';
+
+test.describe('Multiple Tool Calls', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('response with two sequential tool calls displays both', async ({ page }) => {
+    const tool1 = createToolCallEvents('search_knowledge_base', { query: 'Rucio' }, 'KB results', {
+      toolCallId: 'tc_1', durationMs: 200,
+    });
+    const tool2 = createToolCallEvents('search_local_files', { query: 'logs' }, 'File results', {
+      toolCallId: 'tc_2', durationMs: 350,
+    });
+
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        ...tool1.map(e => JSON.stringify(e)),
+        ...tool2.map(e => JSON.stringify(e)),
+        '{"type":"final","response":"Based on both searches, here is the answer.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Search everything');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Based on both searches')).toBeVisible({ timeout: 5000 });
+    await expect(page.getByText('Agent Activity')).toBeVisible();
+
+    // Expand trace to verify both tools are listed
+    await page.locator('.trace-toggle').click();
+    await expect(page.locator('.trace-container:not(.collapsed)')).toBeVisible({ timeout: 2000 });
+
+    const toolSteps = page.locator('.tool-step');
+    await expect(toolSteps).toHaveCount(2);
+  });
+
+  test('interleaved thinking and tool calls render correctly', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"thinking_start","step_id":"think_1"}',
+        '{"type":"thinking_end","step_id":"think_1","duration_ms":300,"thinking_content":"Let me search the knowledge base first."}',
+        '{"type":"tool_start","tool_call_id":"tc_1","tool_name":"search_knowledge_base","tool_args":{"query":"Rucio transfers"}}',
+        '{"type":"tool_output","tool_call_id":"tc_1","output":"Found 3 documents about Rucio."}',
+        '{"type":"tool_end","tool_call_id":"tc_1","status":"success","duration_ms":187}',
+        '{"type":"thinking_start","step_id":"think_2"}',
+        '{"type":"thinking_end","step_id":"think_2","duration_ms":150,"thinking_content":"Now let me check the local files too."}',
+        '{"type":"tool_start","tool_call_id":"tc_2","tool_name":"search_local_files","tool_args":{"query":"Rucio errors"}}',
+        '{"type":"tool_output","tool_call_id":"tc_2","output":"Found log entries."}',
+        '{"type":"tool_end","tool_call_id":"tc_2","status":"success","duration_ms":443}',
+        '{"type":"final","response":"Here is the combined answer.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Search Rucio');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Here is the combined answer')).toBeVisible({ timeout: 5000 });
+    await expect(page.getByText('Agent Activity')).toBeVisible();
+
+    // Expand trace
+    await page.locator('.trace-toggle').click();
+    await expect(page.locator('.trace-container:not(.collapsed)')).toBeVisible({ timeout: 2000 });
+
+    // Should have 2 thinking steps and 2 tool steps
+    const thinkingSteps = page.locator('.thinking-step');
+    const toolSteps = page.locator('.tool-step');
+    await expect(toolSteps).toHaveCount(2);
+    await expect(thinkingSteps).toHaveCount(2);
+  });
+});
+
+test.describe('Tool Call Edge Cases', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('tool_end without tool_start still renders response', async ({ page }) => {
+    // Orphan tool completion — the backend logs a warning but still
+    // emits the events. The UI should not crash.
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"tool_output","tool_call_id":"orphan_1","output":"Orphan result"}',
+        '{"type":"tool_end","tool_call_id":"orphan_1","status":"success","duration_ms":null}',
+        '{"type":"final","response":"Answer despite orphan tool event.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Test orphan');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    // Should still show the final response without crashing
+    await expect(page.getByText('Answer despite orphan tool event')).toBeVisible({ timeout: 5000 });
+  });
+
+  test('tool call with error status displays correctly', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"tool_start","tool_call_id":"tc_err","tool_name":"search_metadata_index","tool_args":{"query":"bad"}}',
+        '{"type":"tool_output","tool_call_id":"tc_err","output":"Error: connection timeout"}',
+        '{"type":"tool_end","tool_call_id":"tc_err","status":"error","duration_ms":5000}',
+        '{"type":"final","response":"The search encountered an error.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Search metadata');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('The search encountered an error')).toBeVisible({ timeout: 5000 });
+    await expect(page.getByText('Agent Activity')).toBeVisible();
+  });
+
+  test('tool with empty output does not break rendering', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"tool_start","tool_call_id":"tc_empty","tool_name":"list_metadata_schema","tool_args":{}}',
+        '{"type":"tool_output","tool_call_id":"tc_empty","output":""}',
+        '{"type":"tool_end","tool_call_id":"tc_empty","status":"success","duration_ms":50}',
+        '{"type":"final","response":"Schema is empty.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('List schema');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Schema is empty')).toBeVisible({ timeout: 5000 });
+  });
+});
+
+test.describe('Streaming Error Events', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('error event mid-stream does not freeze the UI', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      // Error event replaces any prior chunk content and returns immediately
+      const events = [
+        '{"type":"chunk","content":"Starting to answer..."}',
+        '{"type":"error","message":"Model rate limit exceeded"}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Test error');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    // Error message should be displayed
+    await expect(page.getByText('Model rate limit exceeded')).toBeVisible({ timeout: 5000 });
+
+    // Input should be re-enabled (not frozen)
+    await expect(page.getByLabel('Message input')).not.toBeDisabled({ timeout: 5000 });
+  });
+
+  test('HTTP 500 during stream shows error state', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      await route.fulfill({ status: 500, body: 'Internal Server Error' });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Break me');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    // Input should be re-enabled after error
+    await expect(page.getByLabel('Message input')).not.toBeDisabled({ timeout: 5000 });
+  });
+
+  test('network timeout shows error and re-enables input', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      await route.abort('timedout');
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Timeout test');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    // Input should be re-enabled after timeout
+    await expect(page.getByLabel('Message input')).not.toBeDisabled({ timeout: 10000 });
+  });
+});
+
+test.describe('Usage & Metadata', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('accumulated usage from multiple API calls displays correctly', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"thinking_start","step_id":"t1"}',
+        '{"type":"thinking_end","step_id":"t1","duration_ms":100,"thinking_content":"Planning..."}',
+        '{"type":"tool_start","tool_call_id":"tc_1","tool_name":"search_knowledge_base","tool_args":{"query":"test"}}',
+        '{"type":"tool_output","tool_call_id":"tc_1","output":"Results"}',
+        '{"type":"tool_end","tool_call_id":"tc_1","status":"success","duration_ms":200}',
+        '{"type":"final","response":"Done.","message_id":1,"user_message_id":1,"conversation_id":1,"usage":{"prompt_tokens":500,"completion_tokens":200,"total_tokens":700}}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Test');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Agent Activity')).toBeVisible({ timeout: 5000 });
+
+    // Expand trace to see the context meter
+    await page.locator('.trace-toggle').click();
+    await expect(page.locator('.trace-container:not(.collapsed)')).toBeVisible({ timeout: 2000 });
+
+    // Meter label should show accumulated token usage
+    await expect(page.locator('.meter-label')).toBeVisible();
+  });
+
+  test('response without usage data still displays', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"final","response":"No usage data.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Hello');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('No usage data')).toBeVisible({ timeout: 5000 });
+  });
+});
+
+test.describe('Thinking Edge Cases', () => {
+  test.beforeEach(async ({ page }) => {
+    await setupBasicMocks(page);
+  });
+
+  test('thinking without tool calls still shows activity', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"thinking_start","step_id":"t1"}',
+        '{"type":"thinking_end","step_id":"t1","duration_ms":800,"thinking_content":"Let me think about this carefully..."}',
+        '{"type":"final","response":"Here is my thoughtful answer.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Think deeply');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Here is my thoughtful answer')).toBeVisible({ timeout: 5000 });
+    await expect(page.getByText('Agent Activity')).toBeVisible();
+  });
+
+  test('multiple thinking phases render as separate steps', async ({ page }) => {
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"thinking_start","step_id":"t1"}',
+        '{"type":"thinking_end","step_id":"t1","duration_ms":200,"thinking_content":"First thought..."}',
+        '{"type":"thinking_start","step_id":"t2"}',
+        '{"type":"thinking_end","step_id":"t2","duration_ms":300,"thinking_content":"Second thought..."}',
+        '{"type":"thinking_start","step_id":"t3"}',
+        '{"type":"thinking_end","step_id":"t3","duration_ms":100,"thinking_content":"Final thought."}',
+        '{"type":"final","response":"After much deliberation.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Think three times');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('After much deliberation')).toBeVisible({ timeout: 5000 });
+
+    // Expand trace
+    await page.locator('.trace-toggle').click();
+    await expect(page.locator('.trace-container:not(.collapsed)')).toBeVisible({ timeout: 2000 });
+
+    // Should have 3 thinking steps
+    const thinkingSteps = page.locator('.thinking-step');
+    await expect(thinkingSteps).toHaveCount(3);
+  });
+
+  test('thinking_end without thinking_start does not crash', async ({ page }) => {
+    // Edge case: orphan thinking_end event
+    await page.route('**/api/get_chat_response_stream', async (route) => {
+      const events = [
+        '{"type":"thinking_end","step_id":"orphan","duration_ms":0,"thinking_content":""}',
+        '{"type":"final","response":"Normal response.","message_id":1,"user_message_id":1,"conversation_id":1}',
+      ];
+      await route.fulfill({
+        status: 200,
+        contentType: 'text/plain',
+        body: events.join('\n') + '\n',
+      });
+    });
+
+    await page.goto('/chat');
+
+    await page.getByLabel('Message input').fill('Edge case');
+    await page.getByRole('button', { name: 'Send message' }).click();
+
+    await expect(page.getByText('Normal response')).toBeVisible({ timeout: 5000 });
+  });
+});
diff --git a/tests/unit/test_adapter_error_paths.py b/tests/unit/test_adapter_error_paths.py
new file mode 100644
index 000000000..41306ca7c
--- /dev/null
+++ b/tests/unit/test_adapter_error_paths.py
@@ -0,0 +1,312 @@
+"""Unit tests for CopilotEventAdapter error and concurrency paths.
+
+Regression tests for bugs #7 (deadlock on session error),
+#10 (queue poll_timeout hang), and cancellation propagation.
+"""
+
+import asyncio
+import queue
+import threading
+import time
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from src.archi.pipelines.copilot_agents.copilot_event_adapter import _SENTINEL, CopilotEventAdapter
+from src.archi.utils.output_dataclass import PipelineOutput
+
+# ── Helpers ───────────────────────────────────────────────────────────────
+
+
+class FakeAsyncLoop:
+    """Minimal stub for AsyncLoopThread."""
+
+    def __init__(self):
+        self._loop = None
+
+    def run(self, coro, timeout=5.0):
+        loop = asyncio.new_event_loop()
+        try:
+            return loop.run_until_complete(coro)
+        finally:
+            loop.close()
+
+    def run_no_wait(self, coro):
+        loop = asyncio.new_event_loop()
+        future = asyncio.run_coroutine_threadsafe(coro, loop)
+        return future
+
+
+def _make_event(event_type, **kwargs):
+    """Create a mock SDK event with proper enum and data."""
+    from enum import Enum
+
+    SessionEventType = Enum(
+        "SessionEventType",
+        {
+            "ASSISTANT_MESSAGE_DELTA": "assistant.message_delta",
+            "SESSION_ERROR": "session.error",
+        },
+    )
+    _type_map = {
+        "assistant.message_delta": SessionEventType.ASSISTANT_MESSAGE_DELTA,
+        "session.error": SessionEventType.SESSION_ERROR,
+    }
+    ev = MagicMock()
+    ev.type = _type_map.get(event_type, event_type)
+    data = MagicMock()
+    for k, v in kwargs.items():
+        setattr(data, k, v)
+    ev.data = data
+    return ev
+
+
+# ── Tests ─────────────────────────────────────────────────────────────────
+
+
+class TestSessionErrorUnblocksQueue:
+    """Regression for bug #7: _run_session raises; iter_outputs() must
+    terminate instead of deadlocking."""
+
+    def test_signal_done_after_error_unblocks(self):
+        """If the async session throws and calls signal_done() in its
+        finally block, iter_outputs() must drain and return."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        # Simulate: some text arrives, then error, then signal_done
+        adapter._queue.put(
+            PipelineOutput(
+                answer="partial",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        adapter._queue.put(
+            PipelineOutput(
+                answer="",
+                metadata={"event_type": "error", "error": "session crashed"},
+                final=False,
+            )
+        )
+        adapter.signal_done()
+
+        results = list(adapter.iter_outputs())
+        assert len(results) == 2
+        assert results[0].answer == "partial"
+        assert results[1].metadata["event_type"] == "error"
+
+    def test_error_event_via_session_on_handler(self):
+        """session.error events log but don't crash the adapter."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        session = MagicMock()
+        handler_ref = []
+        session.on = lambda h: (handler_ref.append(h), lambda: None)[1]
+        adapter.attach_to_session(session)
+
+        error_event = _make_event("session.error", message="SDK blew up")
+        handler_ref[0](error_event)
+
+        # No PipelineOutput for session.error — it only logs.
+        # The queue should still be empty.
+        assert adapter._queue.empty()
+
+    def test_no_sentinel_timeout_returns(self):
+        """If signal_done() never fires, iter_outputs must return after
+        poll_timeout (not hang forever). Regression for bug #10."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._queue.put(
+            PipelineOutput(
+                answer="ok",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        # No signal_done — simulates async crash without cleanup
+
+        start = time.monotonic()
+        results = list(adapter.iter_outputs(poll_timeout=0.2))
+        elapsed = time.monotonic() - start
+
+        assert len(results) == 1
+        assert results[0].answer == "ok"
+        # Should have returned after ~0.2s, not minutes
+        assert elapsed < 2.0
+
+    def test_empty_queue_timeout_returns_nothing(self):
+        """Completely empty queue with no sentinel should still return
+        an empty list after poll_timeout, not hang."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        start = time.monotonic()
+        results = list(adapter.iter_outputs(poll_timeout=0.1))
+        elapsed = time.monotonic() - start
+
+        assert results == []
+        assert elapsed < 2.0
+
+
+class TestCancellationPropagation:
+    """Cancellation via GeneratorExit should disconnect the session."""
+
+    def test_generator_exit_sets_cancelled(self):
+        """When the consumer raises GeneratorExit, the adapter must set
+        _cancelled = True and attempt session disconnect."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        # Mock a session with a disconnect method
+        mock_session = MagicMock()
+
+        async def fake_disconnect():
+            pass
+
+        mock_session.disconnect = MagicMock(return_value=fake_disconnect())
+        adapter._session = mock_session
+
+        # Put some items to iterate
+        adapter._queue.put(
+            PipelineOutput(
+                answer="a",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        adapter._queue.put(
+            PipelineOutput(
+                answer="b",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        adapter._queue.put(_SENTINEL)
+
+        gen = adapter.iter_outputs()
+        # Consume first item, then close (simulates client disconnect)
+        first = next(gen)
+        assert first.answer == "a"
+        gen.close()
+
+        assert adapter._cancelled is True
+
+    def test_cancelled_flag_suppresses_events(self):
+        """After cancellation, new events pushed by the handler should
+        be ignored."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._cancelled = True
+
+        session = MagicMock()
+        handler_ref = []
+        session.on = lambda h: (handler_ref.append(h), lambda: None)[1]
+        adapter.attach_to_session(session)
+
+        # Fire an event after cancellation
+        event = _make_event("assistant.message_delta", delta_content="ignored")
+        handler_ref[0](event)
+
+        # Queue should still be empty (event was suppressed)
+        assert adapter._queue.empty()
+
+
+class TestConcurrentAccess:
+    """Verify adapter works correctly when producer and consumer run
+    on different threads (the actual deployment pattern)."""
+
+    def test_threaded_producer_consumer(self):
+        """Producer pushes events from one thread, consumer drains
+        iter_outputs() from another. Must not deadlock."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        results = []
+        errors = []
+
+        def producer():
+            try:
+                time.sleep(0.05)
+                for i in range(5):
+                    adapter._queue.put(
+                        PipelineOutput(
+                            answer=f"chunk-{i}",
+                            metadata={"event_type": "text"},
+                            final=False,
+                        )
+                    )
+                    time.sleep(0.01)
+                adapter.signal_done()
+            except Exception as e:
+                errors.append(e)
+
+        def consumer():
+            try:
+                for output in adapter.iter_outputs(poll_timeout=5.0):
+                    results.append(output.answer)
+            except Exception as e:
+                errors.append(e)
+
+        t_prod = threading.Thread(target=producer)
+        t_cons = threading.Thread(target=consumer)
+        t_cons.start()
+        t_prod.start()
+        t_prod.join(timeout=5)
+        t_cons.join(timeout=5)
+
+        assert not errors, f"Thread errors: {errors}"
+        assert results == [f"chunk-{i}" for i in range(5)]
+
+    def test_threaded_producer_crashes(self):
+        """Producer crashes without signal_done — consumer must still
+        return after poll_timeout. Regression for bug #7."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        results = []
+
+        def crashing_producer():
+            adapter._queue.put(
+                PipelineOutput(
+                    answer="before crash",
+                    metadata={"event_type": "text"},
+                    final=False,
+                )
+            )
+            # Crash — no signal_done()
+
+        def consumer():
+            for output in adapter.iter_outputs(poll_timeout=0.3):
+                results.append(output.answer)
+
+        t_prod = threading.Thread(target=crashing_producer)
+        t_cons = threading.Thread(target=consumer)
+        t_cons.start()
+        t_prod.start()
+        t_prod.join(timeout=2)
+        t_cons.join(timeout=5)
+
+        assert results == ["before crash"]
+
+
+class TestAttachToSessionCorrectness:
+    """Verify attach_to_session wires up the event handler correctly."""
+
+    def test_session_on_called(self):
+        """attach_to_session must call session.on() with a handler."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        session = MagicMock()
+        calls = []
+        session.on = lambda h: (calls.append(h), lambda: None)[1]
+
+        adapter.attach_to_session(session)
+
+        assert len(calls) == 1
+        assert callable(calls[0])
+        assert adapter._session is session
+
+    def test_multiple_attach_replaces_session(self):
+        """Re-attaching to a new session should update _session reference."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        session1 = MagicMock()
+        session1.on = lambda h: (None, lambda: None)[1]
+        adapter.attach_to_session(session1)
+        assert adapter._session is session1
+
+        session2 = MagicMock()
+        session2.on = lambda h: (None, lambda: None)[1]
+        adapter.attach_to_session(session2)
+        assert adapter._session is session2
diff --git a/tests/unit/test_byok.py b/tests/unit/test_byok.py
index 612208cd3..0f21e1e79 100644
--- a/tests/unit/test_byok.py
+++ b/tests/unit/test_byok.py
@@ -3,14 +3,12 @@
 
 Tests cover:
 - Key hierarchy (env > session)
-- Session key storage
-- API key endpoints
-- Provider key integration
+- Provider key methods
+- Security (key not exposed in serialization)
 """
 
 import pytest
-from unittest.mock import Mock, patch, MagicMock
-from flask import Flask, session
+from unittest.mock import patch
 
 
 class TestKeyHierarchy:
@@ -52,48 +50,6 @@ def test_session_key_used_when_no_env(self):
         assert provider.is_configured is True
 
 
-class TestProviderKeyIntegration:
-    """Test provider factory functions with API keys."""
-    
-    def test_get_provider_with_api_key_creates_new_instance(self):
-        """get_provider_with_api_key should create a fresh provider instance."""
-        from src.archi.providers import get_provider_with_api_key, ProviderType
-        
-        provider1 = get_provider_with_api_key(ProviderType.OPENAI, "sk-key-1")
-        provider2 = get_provider_with_api_key(ProviderType.OPENAI, "sk-key-2")
-        
-        # Should be different instances
-        assert provider1 is not provider2
-        # With different keys
-        assert provider1.api_key == "sk-key-1"
-        assert provider2.api_key == "sk-key-2"
-    
-    def test_get_chat_model_with_api_key(self):
-        """get_chat_model_with_api_key should return a configured model."""
-        from src.archi.providers import get_chat_model_with_api_key, ProviderType
-        
-        # Test that function accepts api_key parameter and returns a model object
-        # (validation happens at request time, not creation time)
-        model = get_chat_model_with_api_key(
-            ProviderType.OPENAI, 
-            "gpt-4o-mini", 
-            "sk-test-key"
-        )
-        assert model is not None
-    
-    def test_provider_types_supported(self):
-        """All expected provider types should be supported."""
-        from src.archi.providers import ProviderType, list_provider_types
-        
-        types = list_provider_types()
-        
-        assert ProviderType.OPENAI in types
-        assert ProviderType.ANTHROPIC in types
-        assert ProviderType.GEMINI in types
-        assert ProviderType.OPENROUTER in types
-        assert ProviderType.LOCAL in types
-
-
 class TestBaseProviderKeyMethods:
     """Test BaseProvider key-related methods."""
     
@@ -142,31 +98,6 @@ def test_is_configured_without_key(self):
             assert provider.is_configured is False
 
 
-class TestProviderDisplayNames:
-    """Test that providers have correct display names."""
-    
-    def test_openai_display_name(self):
-        """OpenAI provider should have correct display name."""
-        from src.archi.providers import get_provider_with_api_key, ProviderType
-        
-        provider = get_provider_with_api_key(ProviderType.OPENAI, "test-key")
-        assert provider.display_name == "OpenAI"
-    
-    def test_anthropic_display_name(self):
-        """Anthropic provider should have correct display name."""
-        from src.archi.providers import get_provider_with_api_key, ProviderType
-        
-        provider = get_provider_with_api_key(ProviderType.ANTHROPIC, "test-key")
-        assert provider.display_name == "Anthropic"
-    
-    def test_gemini_display_name(self):
-        """Gemini provider should have correct display name."""
-        from src.archi.providers import get_provider_with_api_key, ProviderType
-        
-        provider = get_provider_with_api_key(ProviderType.GEMINI, "test-key")
-        assert provider.display_name == "Google Gemini"
-
-
 class TestSecurityRequirements:
     """Test security-related requirements."""
     
@@ -190,31 +121,3 @@ def test_api_key_not_in_repr(self):
         # Check that the key doesn't appear in any string representation
         repr_str = repr(provider) if hasattr(provider, '__repr__') else str(provider)
         assert "secret-key-12345" not in repr_str
-
-
-class TestModelInfo:
-    """Test ModelInfo dataclass."""
-    
-    def test_model_info_to_dict(self):
-        """ModelInfo.to_dict() should return correct structure."""
-        from src.archi.providers.base import ModelInfo
-        
-        model = ModelInfo(
-            id="gpt-4o",
-            name="gpt-4o",
-            display_name="GPT-4o",
-            context_window=128000,
-            supports_tools=True,
-            supports_streaming=True,
-            supports_vision=True,
-        )
-        
-        d = model.to_dict()
-        
-        assert d["id"] == "gpt-4o"
-        assert d["name"] == "gpt-4o"
-        assert d["display_name"] == "GPT-4o"
-        assert d["context_window"] == 128000
-        assert d["supports_tools"] is True
-        assert d["supports_streaming"] is True
-        assert d["supports_vision"] is True
diff --git a/tests/unit/test_chat_wrapper_stream.py b/tests/unit/test_chat_wrapper_stream.py
new file mode 100644
index 000000000..88aae8b14
--- /dev/null
+++ b/tests/unit/test_chat_wrapper_stream.py
@@ -0,0 +1,578 @@
+"""Unit tests for ChatWrapper.stream() event routing.
+
+Tests the mapping from PipelineOutput events (as produced by
+CopilotEventAdapter) through ChatWrapper.stream() to the NDJSON events
+consumed by the frontend.  Uses a mock archi instance so no real
+pipeline, vectorstore, or database is needed.
+
+Regression tests for bug #9 (error event not handled).
+"""
+
+import time
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from typing import Any, Dict, List, Optional
+from unittest.mock import MagicMock, PropertyMock, patch
+
+import pytest
+
+from src.archi.utils.output_dataclass import PipelineOutput
+
+# ── Minimal context stub ─────────────────────────────────────────────────
+
+
+@dataclass
+class FakeChatContext:
+    sender: str = "user"
+    content: str = "Hello"
+    conversation_id: int = 42
+    history: list = None
+    is_refresh: bool = False
+    model_used: str = None
+    provider_used: str = None
+    pipeline_used: str = None
+
+    def __post_init__(self):
+        if self.history is None:
+            self.history = [("user", "Hello")]
+
+
+# ── Helper to create a minimal ChatWrapper for testing ───────────────────
+
+
+def _make_chat_wrapper(pipeline_outputs: List[PipelineOutput]):
+    """Create a ChatWrapper-like object with .stream() that routes
+    the given pipeline outputs through the full event-routing logic.
+
+    We monkey-patch heavily to avoid needing Flask, Postgres, etc.
+    """
+    # Import ChatWrapper — but we can't instantiate it (needs full config).
+    # Instead, test the event routing logic by extracting the core loop.
+    # We'll build a simplified version that exercises the same code paths.
+
+    # Actually, the simplest approach: create a mock that has the real
+    # stream() method bound to it, with all dependencies mocked.
+    from src.interfaces.chat_app.app import ChatWrapper
+
+    wrapper = object.__new__(ChatWrapper)
+
+    # Set up minimal required attributes
+    wrapper.lock = MagicMock()
+    wrapper.number_of_queries = 0
+    wrapper.conn = None
+    wrapper.cursor = None
+    wrapper.config = {"name": "test"}
+    wrapper.current_config_name = "test"
+    wrapper.similarity_score_reference = 0.5
+    wrapper.current_model_used = "test-model"
+    wrapper.current_pipeline_used = "CopilotAgentPipeline"
+
+    # Mock archi to yield our controlled outputs
+    mock_archi = MagicMock()
+    mock_archi.stream = MagicMock(return_value=iter(pipeline_outputs))
+    mock_archi.pipeline_name = "CopilotAgentPipeline"
+    mock_archi.pipeline = MagicMock()
+    mock_archi.pipeline.supports_persisted_session_id = MagicMock(return_value=True)
+    wrapper.archi = mock_archi
+
+    # Mock all DB/context methods
+    wrapper._prepare_chat_context = MagicMock(
+        return_value=(FakeChatContext(), None)  # context, error_code
+    )
+    wrapper._resolve_config_name = MagicMock(return_value="test")
+    wrapper.update_config = MagicMock()
+    wrapper.create_agent_trace = MagicMock(return_value="trace-123")
+    wrapper.update_agent_trace = MagicMock()
+    wrapper._init_timestamps = MagicMock(return_value={})
+    wrapper._create_provider_llm = MagicMock(return_value=None)
+    wrapper.insert_timing = MagicMock()
+    wrapper.get_pipeline_session_id = MagicMock(return_value=None)
+    wrapper.set_pipeline_session_id = MagicMock()
+
+    # Mock _finalize_result to return the output text and message IDs
+    def _fake_finalize(
+        result, *, context, server_received_msg_ts, timestamps, render_markdown=True
+    ):
+        return result.answer or "finalized", [1, 2]
+
+    wrapper._finalize_result = _fake_finalize
+    wrapper.insert_tool_calls_from_output = MagicMock()
+    wrapper.get_top_sources = MagicMock(return_value=[])
+    wrapper.format_links_markdown = MagicMock(return_value="")
+
+    return wrapper
+
+
+def _collect_events(wrapper, **kwargs) -> List[Dict[str, Any]]:
+    """Run wrapper.stream() and collect all yielded events."""
+    now = datetime.now(timezone.utc)
+    defaults = dict(
+        message=[("user", "Hello")],
+        conversation_id=42,
+        client_id="test-client",
+        is_refresh=False,
+        server_received_msg_ts=now,
+        client_sent_msg_ts=now.timestamp(),
+        client_timeout=30.0,
+        config_name="test",
+    )
+    defaults.update(kwargs)
+    return list(wrapper.stream(**defaults))
+
+
+# ── Tests ─────────────────────────────────────────────────────────────────
+
+
+class TestStreamTextEvents:
+    """Verify text chunks are routed correctly."""
+
+    def test_text_events_yield_chunks(self):
+        outputs = [
+            PipelineOutput(
+                answer="Hello", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="Hello world", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="Hello world", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        chunks = [e for e in events if e.get("type") == "chunk"]
+        assert len(chunks) == 2
+        assert chunks[0]["content"] == "Hello"
+        assert chunks[1]["content"] == "Hello world"
+        assert chunks[0]["accumulated"] is True
+
+    def test_empty_text_not_yielded(self):
+        outputs = [
+            PipelineOutput(answer="", metadata={"event_type": "text"}, final=False),
+            PipelineOutput(
+                answer="Hello", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="Hello", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        chunks = [e for e in events if e.get("type") == "chunk"]
+        # Empty text should NOT yield a chunk
+        assert len(chunks) == 1
+        assert chunks[0]["content"] == "Hello"
+
+
+class TestStreamToolEvents:
+    """Verify tool lifecycle events pass through."""
+
+    def test_tool_start_event(self):
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_start",
+                    "tool_call_id": "tc-1",
+                    "tool_name": "search_vectorstore_hybrid",
+                    "tool_args": {"query": "test"},
+                },
+                final=False,
+            ),
+            PipelineOutput(
+                answer="result", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="result", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        tool_starts = [e for e in events if e.get("type") == "tool_start"]
+        assert len(tool_starts) == 1
+        assert tool_starts[0]["tool_name"] == "search_vectorstore_hybrid"
+        assert tool_starts[0]["tool_call_id"] == "tc-1"
+        assert tool_starts[0]["tool_args"] == {"query": "test"}
+
+    def test_tool_output_event(self):
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_output",
+                    "tool_call_id": "tc-1",
+                    "output": "Found 3 documents",
+                },
+                final=False,
+            ),
+            PipelineOutput(answer="done", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        tool_outputs = [e for e in events if e.get("type") == "tool_output"]
+        assert len(tool_outputs) == 1
+        assert tool_outputs[0]["output"] == "Found 3 documents"
+        assert tool_outputs[0]["tool_call_id"] == "tc-1"
+
+    def test_tool_end_event(self):
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_end",
+                    "tool_call_id": "tc-1",
+                    "status": "success",
+                    "duration_ms": 150,
+                },
+                final=False,
+            ),
+            PipelineOutput(answer="done", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        tool_ends = [e for e in events if e.get("type") == "tool_end"]
+        assert len(tool_ends) == 1
+        assert tool_ends[0]["status"] == "success"
+
+    def test_tool_output_truncation(self):
+        long_output = "x" * 2000
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_output",
+                    "tool_call_id": "tc-1",
+                    "output": long_output,
+                },
+                final=False,
+            ),
+            PipelineOutput(answer="done", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        tool_outputs = [e for e in events if e.get("type") == "tool_output"]
+        assert len(tool_outputs) == 1
+        assert tool_outputs[0]["truncated"] is True
+        assert tool_outputs[0]["full_length"] == 2000
+        assert len(tool_outputs[0]["output"]) <= 800
+
+
+class TestStreamThinkingEvents:
+    """Verify thinking lifecycle events."""
+
+    def test_thinking_start_and_end(self):
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={"event_type": "thinking_start", "step_id": "s1"},
+                final=False,
+            ),
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "thinking_end",
+                    "step_id": "s1",
+                    "duration_ms": 200,
+                    "thinking_content": "Let me analyze...",
+                },
+                final=False,
+            ),
+            PipelineOutput(
+                answer="Result", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="Result", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        thinking_starts = [e for e in events if e.get("type") == "thinking_start"]
+        thinking_ends = [e for e in events if e.get("type") == "thinking_end"]
+
+        assert len(thinking_starts) == 1
+        assert thinking_starts[0]["step_id"] == "s1"
+        assert len(thinking_ends) == 1
+        assert thinking_ends[0]["thinking_content"] == "Let me analyze..."
+        assert thinking_ends[0]["step_id"] == "s1"
+
+
+class TestStreamFinalEvent:
+    """Verify the final event has all required fields."""
+
+    def test_final_event_structure(self):
+        outputs = [
+            PipelineOutput(
+                answer="Hello", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="Hello",
+                metadata={
+                    "event_type": "final",
+                    "usage": {
+                        "prompt_tokens": 100,
+                        "completion_tokens": 50,
+                        "total_tokens": 150,
+                    },
+                },
+                final=True,
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        finals = [e for e in events if e.get("type") == "final"]
+        assert len(finals) == 1
+        final = finals[0]
+        assert "response" in final
+        assert "conversation_id" in final
+        assert final["conversation_id"] == 42
+        assert final["usage"] == {
+            "prompt_tokens": 100,
+            "completion_tokens": 50,
+            "total_tokens": 150,
+        }
+        assert "trace_id" in final
+
+    def test_no_output_yields_error(self):
+        """If pipeline yields nothing, stream should return an error."""
+        wrapper = _make_chat_wrapper([])
+        events = _collect_events(wrapper)
+
+        errors = [e for e in events if e.get("type") == "error"]
+        assert len(errors) == 1
+        assert errors[0]["status"] == 500
+
+
+class TestStreamErrorEvent:
+    """Regression for bug #9: error events must propagate to client."""
+
+    def test_error_event_yielded(self):
+        outputs = [
+            PipelineOutput(
+                answer="partial", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="",
+                metadata={"event_type": "error", "error": "SDK session crashed"},
+                final=False,
+            ),
+            # After error, pipeline might still yield final
+            PipelineOutput(
+                answer="partial", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        errors = [e for e in events if e.get("type") == "error"]
+        assert len(errors) == 1
+        assert "SDK session crashed" in errors[0]["message"]
+
+    def test_error_event_does_not_stop_stream(self):
+        """An error event mid-stream should not prevent the final event."""
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={"event_type": "error", "error": "tool failed"},
+                final=False,
+            ),
+            PipelineOutput(
+                answer="recovered", metadata={"event_type": "text"}, final=False
+            ),
+            PipelineOutput(
+                answer="recovered", metadata={"event_type": "final"}, final=True
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        finals = [e for e in events if e.get("type") == "final"]
+        assert len(finals) == 1
+
+
+class TestStreamUsageInTrace:
+    """Verify usage data ends up in trace events."""
+
+    def test_usage_appended_to_trace_events(self):
+        outputs = [
+            PipelineOutput(answer="Hi", metadata={"event_type": "text"}, final=False),
+            PipelineOutput(
+                answer="Hi",
+                metadata={
+                    "event_type": "final",
+                    "usage": {
+                        "prompt_tokens": 200,
+                        "completion_tokens": 80,
+                        "total_tokens": 280,
+                    },
+                },
+                final=True,
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper)
+
+        # The update_agent_trace call should have received usage in trace_events
+        assert wrapper.update_agent_trace.called
+        call_kwargs = wrapper.update_agent_trace.call_args
+        trace_events = call_kwargs.kwargs.get("events") or call_kwargs[1].get(
+            "events", []
+        )
+        usage_events = [e for e in trace_events if e.get("type") == "usage"]
+        assert len(usage_events) == 1
+        assert usage_events[0]["prompt_tokens"] == 200
+        assert usage_events[0]["completion_tokens"] == 80
+
+
+class TestStreamContextPrepFailure:
+    """Verify stream handles _prepare_chat_context errors."""
+
+    def test_timeout_error(self):
+        wrapper = _make_chat_wrapper([])
+        wrapper._prepare_chat_context = MagicMock(return_value=(None, 408))
+
+        events = _collect_events(wrapper)
+        errors = [e for e in events if e.get("type") == "error"]
+        assert len(errors) == 1
+        assert errors[0]["status"] == 408
+        assert "timeout" in errors[0]["message"]
+
+    def test_conversation_not_found(self):
+        wrapper = _make_chat_wrapper([])
+        wrapper._prepare_chat_context = MagicMock(return_value=(None, 403))
+
+        events = _collect_events(wrapper)
+        errors = [e for e in events if e.get("type") == "error"]
+        assert len(errors) == 1
+        assert errors[0]["status"] == 403
+
+
+class TestStreamToolStepsDisabled:
+    """Verify include_tool_steps=False suppresses tool events."""
+
+    def test_tool_events_suppressed(self):
+        outputs = [
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_start",
+                    "tool_call_id": "tc-1",
+                    "tool_name": "search_vectorstore_hybrid",
+                    "tool_args": {},
+                },
+                final=False,
+            ),
+            PipelineOutput(
+                answer="",
+                metadata={
+                    "event_type": "tool_output",
+                    "tool_call_id": "tc-1",
+                    "output": "data",
+                },
+                final=False,
+            ),
+            PipelineOutput(answer="done", metadata={"event_type": "text"}, final=False),
+            PipelineOutput(answer="done", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        events = _collect_events(wrapper, include_tool_steps=False)
+
+        tool_events = [
+            e for e in events if e.get("type") in ("tool_start", "tool_output")
+        ]
+        assert tool_events == []
+
+        chunks = [e for e in events if e.get("type") == "chunk"]
+        assert len(chunks) == 1
+
+
+class TestStreamProviderOverride:
+    """Bug #15/#16: provider/model/api_key forwarded to archi.stream()."""
+
+    def test_provider_model_passed_to_archi_stream(self):
+        """When provider & model are set, they appear in archi.stream() kwargs."""
+        outputs = [
+            PipelineOutput(answer="ok", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        _collect_events(wrapper, provider="anthropic", model="claude-sonnet-4-20250514")
+
+        # archi.stream() should have been called with provider and model kwargs
+        call_kwargs = wrapper.archi.stream.call_args
+        assert call_kwargs is not None
+        # Check kwargs (may be passed as keyword args)
+        kwargs = call_kwargs.kwargs if call_kwargs.kwargs else {}
+        assert kwargs.get("provider") == "anthropic"
+        assert kwargs.get("model") == "claude-sonnet-4-20250514"
+
+    def test_api_key_passed_to_archi_stream(self):
+        """When provider_api_key is set, it appears in archi.stream() kwargs."""
+        outputs = [
+            PipelineOutput(answer="ok", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        _collect_events(
+            wrapper, provider="openai", model="gpt-4o", provider_api_key="sk-user-key"
+        )
+
+        kwargs = wrapper.archi.stream.call_args.kwargs
+        assert kwargs.get("provider_api_key") == "sk-user-key"
+
+    def test_no_override_when_provider_not_set(self):
+        """Without provider/model, they should not appear in archi.stream() kwargs."""
+        outputs = [
+            PipelineOutput(answer="ok", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        _collect_events(wrapper)
+
+        kwargs = wrapper.archi.stream.call_args.kwargs
+        assert "provider" not in kwargs
+        assert "model" not in kwargs
+        assert "provider_api_key" not in kwargs
+
+
+class TestCopilotSessionPersistence:
+    """Persist and reuse real Copilot SDK session IDs across turns."""
+
+    def test_stored_session_id_forwarded_to_pipeline(self):
+        outputs = [
+            PipelineOutput(answer="ok", metadata={"event_type": "final"}, final=True),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        wrapper.get_pipeline_session_id.return_value = "sdk-session-123"
+
+        _collect_events(wrapper)
+
+        kwargs = wrapper.archi.stream.call_args.kwargs
+        assert kwargs.get("pipeline_session_id") == "sdk-session-123"
+
+    def test_final_output_persists_new_session_id(self):
+        outputs = [
+            PipelineOutput(
+                answer="ok",
+                metadata={
+                    "event_type": "final",
+                    "pipeline_session_id": "sdk-session-456",
+                },
+                final=True,
+            ),
+        ]
+        wrapper = _make_chat_wrapper(outputs)
+        wrapper.get_pipeline_session_id.return_value = None
+
+        _collect_events(wrapper)
+
+        wrapper.set_pipeline_session_id.assert_called_once_with(
+            42,
+            "test-client",
+            "sdk-session-456",
+            user_id=None,
+        )
diff --git a/tests/unit/test_copilot_event_adapter.py b/tests/unit/test_copilot_event_adapter.py
new file mode 100644
index 000000000..dcdf1ee57
--- /dev/null
+++ b/tests/unit/test_copilot_event_adapter.py
@@ -0,0 +1,525 @@
+"""Unit tests for CopilotEventAdapter.
+
+Tests the event→PipelineOutput translation, thinking state machine,
+tool lifecycle via hooks, text accumulation, and cancellation.
+"""
+
+import asyncio
+import queue
+import threading
+import time
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from src.archi.pipelines.copilot_agents.copilot_event_adapter import (CopilotEventAdapter,
+                                             _ToolCallRecord)
+from src.archi.utils.output_dataclass import PipelineOutput
+
+# ── Helpers ───────────────────────────────────────────────────────────────
+
+
+class FakeAsyncLoop:
+    """Minimal stub for AsyncLoopThread used in tests."""
+
+    def run(self, coro, timeout=5.0):
+        loop = asyncio.new_event_loop()
+        try:
+            return loop.run_until_complete(coro)
+        finally:
+            loop.close()
+
+    def run_no_wait(self, coro):
+        loop = asyncio.new_event_loop()
+        future = asyncio.run_coroutine_threadsafe(coro, loop)
+        return future
+
+
+def _make_event(event_type, **kwargs):
+    """Create a mock SDK event with proper type enum and data object."""
+    try:
+        from copilot.generated.session_events import SessionEventType
+    except ImportError:
+        # SDK not installed locally — use a mock enum that matches by value
+        from enum import Enum
+
+        SessionEventType = Enum(
+            "SessionEventType",
+            {
+                "ASSISTANT_MESSAGE_DELTA": "assistant.message_delta",
+                "ASSISTANT_STREAMING_DELTA": "assistant.streaming_delta",
+                "ASSISTANT_REASONING_DELTA": "assistant.reasoning_delta",
+                "ASSISTANT_MESSAGE": "assistant.message",
+                "ASSISTANT_REASONING": "assistant.reasoning",
+                "ASSISTANT_TURN_END": "assistant.turn_end",
+                "ASSISTANT_USAGE": "assistant.usage",
+                "SESSION_IDLE": "session.idle",
+                "SESSION_ERROR": "session.error",
+                "TOOL_EXECUTION_START": "tool.execution_start",
+                "TOOL_EXECUTION_COMPLETE": "tool.execution_complete",
+            },
+        )
+
+    _type_map = {
+        "assistant.message_delta": SessionEventType.ASSISTANT_MESSAGE_DELTA,
+        "assistant.streaming_delta": SessionEventType.ASSISTANT_STREAMING_DELTA,
+        "assistant.reasoning_delta": SessionEventType.ASSISTANT_REASONING_DELTA,
+        "assistant.message": SessionEventType.ASSISTANT_MESSAGE,
+        "assistant.reasoning": SessionEventType.ASSISTANT_REASONING,
+        "assistant.turn_end": SessionEventType.ASSISTANT_TURN_END,
+        "assistant.usage": SessionEventType.ASSISTANT_USAGE,
+        "session.idle": SessionEventType.SESSION_IDLE,
+        "session.error": SessionEventType.SESSION_ERROR,
+        "tool.execution_start": SessionEventType.TOOL_EXECUTION_START,
+        "tool.execution_complete": SessionEventType.TOOL_EXECUTION_COMPLETE,
+    }
+    ev = MagicMock()
+    ev.type = _type_map.get(event_type, event_type)
+
+    # Build data object with the specified attributes
+    data = MagicMock()
+    for k, v in kwargs.items():
+        setattr(data, k, v)
+    ev.data = data
+    return ev
+
+
+def _fire_events(adapter, events):
+    """Fire events through the adapter's registered event handler."""
+    # Get the handler that was registered via session.on()
+    session = MagicMock()
+    handler_ref = []
+
+    def fake_on(handler):
+        handler_ref.append(handler)
+        return lambda: None
+
+    session.on = fake_on
+    adapter.attach_to_session(session)
+    assert handler_ref, "No handler registered via session.on()"
+
+    handler = handler_ref[0]
+    for event in events:
+        handler(event)
+    adapter.signal_done()
+
+
+def _make_tool_use(*, name="my_tool", args=None, result=None):
+    """Create a hook input dict matching the SDK's PreToolUseHookInput /
+    PostToolUseHookInput TypedDict format."""
+    d = {
+        "toolName": name,
+        "toolArgs": args or {"q": "test"},
+        "timestamp": 1700000000,
+        "cwd": "/tmp",
+    }
+    if result is not None:
+        d["toolResult"] = result
+    return d
+
+
+# ── Tests ─────────────────────────────────────────────────────────────────
+
+
+class TestTextAccumulation:
+    """Decision 14: adapter must accumulate message deltas."""
+
+    def test_message_deltas_accumulate(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event("assistant.message_delta", delta_content="Hello"),
+            _make_event("assistant.message_delta", delta_content=" world"),
+        ]
+
+        _fire_events(adapter, events)
+
+        # Drain queue
+        outputs = []
+        while True:
+            item = adapter._queue.get_nowait()
+            if not isinstance(item, PipelineOutput):
+                break
+            outputs.append(item)
+
+        # First delta yields "Hello", second yields "Hello world"
+        text_outputs = [o for o in outputs if o.metadata.get("event_type") == "text"]
+        assert len(text_outputs) == 2
+        assert text_outputs[0].answer == "Hello"
+        assert text_outputs[1].answer == "Hello world"
+
+    def test_final_output_has_full_text(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._response_buffer = "Complete answer"
+
+        final = adapter.build_final_output()
+        assert final.answer == "Complete answer"
+        assert final.final is True
+        assert final.metadata["event_type"] == "final"
+
+
+class TestThinkingStateMachine:
+    """Decision 3: paired thinking_start/thinking_end with step_id."""
+
+    def test_reasoning_delta_starts_thinking(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event("assistant.reasoning_delta", delta_content="Let me think..."),
+            _make_event("assistant.message_delta", delta_content="Answer"),
+        ]
+
+        _fire_events(adapter, events)
+
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+
+        event_types = [o.metadata.get("event_type") for o in outputs]
+        assert "thinking_start" in event_types
+        assert "thinking_end" in event_types
+
+        # thinking_end should contain the thinking content
+        thinking_end = [
+            o for o in outputs if o.metadata.get("event_type") == "thinking_end"
+        ][0]
+        assert "Let me think..." in thinking_end.metadata.get("thinking_content", "")
+
+        # thinking_start and thinking_end share the same step_id
+        thinking_start = [
+            o for o in outputs if o.metadata.get("event_type") == "thinking_start"
+        ][0]
+        assert thinking_start.metadata["step_id"] == thinking_end.metadata["step_id"]
+
+
+class TestToolStreamingEvents:
+    """Tool events via streaming events (tool.execution_start / tool.execution_complete)."""
+
+    def test_tool_execution_start_emits_tool_start(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-123",
+                tool_name="my_tool",
+                arguments={"q": "test"},
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+
+        tool_starts = [
+            o for o in outputs if o.metadata.get("event_type") == "tool_start"
+        ]
+        assert len(tool_starts) == 1
+        assert tool_starts[0].metadata["tool_call_id"] == "tc-123"
+        assert tool_starts[0].metadata["tool_name"] == "my_tool"
+        assert tool_starts[0].metadata["tool_args"] == {"q": "test"}
+
+    def test_tool_execution_complete_emits_tool_output(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-456",
+                tool_name="search",
+                arguments={"q": "hello"},
+            ),
+            _make_event(
+                "tool.execution_complete",
+                tool_call_id="tc-456",
+                result="found it",
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+
+        tool_outputs = [
+            o for o in outputs if o.metadata.get("event_type") == "tool_output"
+        ]
+        assert len(tool_outputs) == 1
+        assert tool_outputs[0].metadata["tool_call_id"] == "tc-456"
+        assert tool_outputs[0].metadata["output"] == "found it"
+
+    def test_tool_calls_recorded_for_metadata(self):
+        """Decision 12: tool calls stored in metadata."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-789",
+                tool_name="search",
+                arguments={"q": "test"},
+            ),
+            _make_event(
+                "tool.execution_complete",
+                tool_call_id="tc-789",
+                result="found it",
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        assert len(adapter._tool_calls) == 1
+        assert adapter._tool_calls[0].name == "search"
+        assert adapter._tool_calls[0].result == "found it"
+        assert adapter._tool_calls[0].id == "tc-789"
+
+        final = adapter.build_final_output()
+        tc = final.metadata["tool_calls"]
+        assert len(tc) == 1
+        assert tc[0]["name"] == "search"
+        assert tc[0]["result"] == "found it"
+        assert tc[0]["id"] == "tc-789"
+
+    def test_tool_call_id_correlation(self):
+        """Start and complete events should share the same native toolCallId."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-corr-1",
+                tool_name="search",
+                arguments={},
+            ),
+            _make_event(
+                "tool.execution_complete",
+                tool_call_id="tc-corr-1",
+                result="ok",
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+
+        start_ids = [
+            o.metadata["tool_call_id"]
+            for o in outputs
+            if o.metadata.get("event_type") == "tool_start"
+        ]
+        end_ids = [
+            o.metadata["tool_call_id"]
+            for o in outputs
+            if o.metadata.get("event_type") == "tool_output"
+        ]
+        assert start_ids == ["tc-corr-1"]
+        assert end_ids == ["tc-corr-1"]
+
+    def test_multiple_concurrent_tools(self):
+        """Multiple tools running concurrently should be tracked independently."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-a",
+                tool_name="search",
+                arguments={"q": "a"},
+            ),
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-b",
+                tool_name="fetch",
+                arguments={"url": "b"},
+            ),
+            _make_event(
+                "tool.execution_complete", tool_call_id="tc-b", result="result-b"
+            ),
+            _make_event(
+                "tool.execution_complete", tool_call_id="tc-a", result="result-a"
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        assert len(adapter._tool_calls) == 2
+        assert adapter._tool_calls[0].id == "tc-a"
+        assert adapter._tool_calls[1].id == "tc-b"
+        # Results are matched by ID, not order
+        assert adapter._tool_calls[0].result == "result-a"
+        assert adapter._tool_calls[1].result == "result-b"
+
+    def test_tool_start_ends_thinking(self):
+        """Tool invocation should end active thinking state."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event("assistant.reasoning_delta", delta_content="Let me think..."),
+            _make_event(
+                "tool.execution_start",
+                tool_call_id="tc-x",
+                tool_name="search",
+                arguments={},
+            ),
+        ]
+        _fire_events(adapter, events)
+
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+
+        event_types = [o.metadata.get("event_type") for o in outputs]
+        assert "thinking_start" in event_types
+        assert "thinking_end" in event_types
+        # thinking_end comes before tool_start
+        thinking_end_idx = event_types.index("thinking_end")
+        tool_start_idx = event_types.index("tool_start")
+        assert thinking_end_idx < tool_start_idx
+
+    def test_orphan_tool_complete_logs_warning(self, caplog):
+        """tool.execution_complete without matching start logs a warning."""
+        import logging
+
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        events = [
+            _make_event(
+                "tool.execution_complete",
+                tool_call_id="tc-orphan",
+                result="dangling result",
+            ),
+        ]
+        with caplog.at_level(logging.WARNING):
+            _fire_events(adapter, events)
+
+        assert "unknown tool_call_id=tc-orphan" in caplog.text
+        # Still emits tool_output and tool_end events so the UI doesn't hang
+        outputs = []
+        while not adapter._queue.empty():
+            item = adapter._queue.get_nowait()
+            if isinstance(item, PipelineOutput):
+                outputs.append(item)
+        event_types = [o.metadata.get("event_type") for o in outputs]
+        assert "tool_output" in event_types
+        assert "tool_end" in event_types
+
+
+class TestUsageCapture:
+    """Decision 20: usage metadata normalization."""
+
+    def test_capture_usage_dict(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._capture_usage(
+            {
+                "prompt_tokens": 100,
+                "completion_tokens": 50,
+                "total_tokens": 150,
+            }
+        )
+        assert adapter._usage["prompt_tokens"] == 100
+        assert adapter._usage["completion_tokens"] == 50
+        assert adapter._usage["total_tokens"] == 150
+
+    def test_capture_usage_object_camelcase(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        usage = MagicMock(spec=[])
+        usage.input_tokens = 200
+        usage.output_tokens = 80
+        adapter._capture_usage(usage)
+        assert adapter._usage["prompt_tokens"] == 200
+        assert adapter._usage["completion_tokens"] == 80
+
+    def test_usage_in_final_output(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._capture_usage(
+            {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15}
+        )
+        final = adapter.build_final_output()
+        assert final.metadata["usage"]["total_tokens"] == 15
+
+
+class TestIterOutputs:
+    """Test the sync generator bridge."""
+
+    def test_iter_outputs_drains_queue(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._queue.put(
+            PipelineOutput(answer="a", metadata={"event_type": "text"}, final=False)
+        )
+        adapter._queue.put(
+            PipelineOutput(answer="b", metadata={"event_type": "text"}, final=False)
+        )
+        from src.archi.pipelines.copilot_agents.copilot_event_adapter import _SENTINEL
+
+        adapter._queue.put(_SENTINEL)
+
+        results = list(adapter.iter_outputs())
+        assert len(results) == 2
+        assert results[0].answer == "a"
+        assert results[1].answer == "b"
+
+
+class TestBuildFinalOutput:
+    """Test final output construction."""
+
+    def test_source_documents_included(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._response_buffer = "answer"
+
+        doc = MagicMock()
+        doc.page_content = "some content"
+        final = adapter.build_final_output(
+            source_documents=[doc],
+            retriever_scores=[0.95],
+        )
+        assert len(final.source_documents) == 1
+        assert final.metadata["retriever_scores"] == [0.95]
+
+
+class TestIterOutputsTimeout:
+    """Ensure iter_outputs doesn't block forever if signal_done is never called."""
+
+    def test_queue_timeout_unblocks(self):
+        """If signal_done() is never called, iter_outputs should still
+        return after poll_timeout rather than hanging forever."""
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._queue.put(
+            PipelineOutput(answer="ok", metadata={"event_type": "text"}, final=False)
+        )
+        # No sentinel pushed — simulates async session crash
+
+        results = list(adapter.iter_outputs(poll_timeout=0.1))
+        assert len(results) == 1
+        assert results[0].answer == "ok"
+
+
+class TestSignalDoneUsageWarning:
+    """Bug fix: signal_done logs a warning when no usage data received."""
+
+    def test_warning_when_no_usage(self, caplog):
+        import logging
+
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        assert adapter._usage is None
+        with caplog.at_level(logging.WARNING):
+            adapter.signal_done()
+        assert "No usage data received" in caplog.text
+
+    def test_no_warning_when_usage_present(self, caplog):
+        import logging
+
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._usage = {"prompt_tokens": 10, "completion_tokens": 5}
+        with caplog.at_level(logging.WARNING):
+            adapter.signal_done()
+        assert "No usage data received" not in caplog.text
diff --git a/tests/unit/test_copilot_pipeline.py b/tests/unit/test_copilot_pipeline.py
new file mode 100644
index 000000000..964b60eb0
--- /dev/null
+++ b/tests/unit/test_copilot_pipeline.py
@@ -0,0 +1,455 @@
+"""Unit tests for CopilotAgentPipeline helper functions.
+
+Tests the provider mapping, history formatter, MCP config passthrough,
+and tool registry without requiring the Copilot SDK to be installed.
+"""
+
+import pytest
+
+from src.archi.pipelines.copilot_agents.copilot_agent import (_build_mcp_servers,
+                                               _build_sdk_provider,
+                                               _build_tool_restriction_kwargs)
+
+
+class TestProviderMapping:
+    """Decision 4: BYOK provider mapping."""
+
+    def test_openai_provider(self):
+        result = _build_sdk_provider("openai", "gpt-4o", {}, api_key="sk-test")
+        assert result["type"] == "openai"
+        assert result["api_key"] == "sk-test"
+        # model is passed separately to create_session, not in provider dict
+        assert "model" not in result
+
+    def test_anthropic_provider(self):
+        result = _build_sdk_provider(
+            "anthropic", "claude-sonnet-4-20250514", {}, api_key="key"
+        )
+        assert result["type"] == "anthropic"
+        assert result["api_key"] == "key"
+        assert "model" not in result
+
+    def test_openrouter_maps_to_openai(self):
+        cfg = {"openrouter": {"base_url": "https://openrouter.ai/api/v1"}}
+        result = _build_sdk_provider(
+            "openrouter", "google/gemini-2.0-flash", cfg, api_key="or-key"
+        )
+        assert result["type"] == "openai"
+        assert result["base_url"] == "https://openrouter.ai/api/v1"
+        assert result["api_key"] == "or-key"
+
+    def test_local_ollama_maps_to_openai(self):
+        cfg = {"local": {"base_url": "http://localhost:11434/v1"}}
+        result = _build_sdk_provider("local", "llama3", cfg)
+        assert result["type"] == "openai"
+        assert result["base_url"] == "http://localhost:11434/v1"
+        assert "api_key" not in result  # Ollama doesn't need one
+
+    def test_unsupported_provider_raises(self):
+        with pytest.raises(ValueError, match="cannot be mapped"):
+            _build_sdk_provider("gemini", "gemini-pro", {})
+
+    def test_base_url_from_config(self):
+        cfg = {"openai": {"base_url": "https://custom.endpoint/v1"}}
+        result = _build_sdk_provider("openai", "gpt-4o", cfg, api_key="k")
+        assert result["base_url"] == "https://custom.endpoint/v1"
+
+
+class TestMCPPassthrough:
+    """Decision 8: MCP config mapping."""
+
+    def test_no_mcp_config(self):
+        assert _build_mcp_servers({}) is None
+        assert _build_mcp_servers({"other": "stuff"}) is None
+
+    def test_stdio_server(self):
+        config = {
+            "mcp_servers": {
+                "my_server": {
+                    "transport": "stdio",
+                    "command": "uvx",
+                    "args": ["mcp-server-example"],
+                }
+            }
+        }
+        result = _build_mcp_servers(config)
+        assert result is not None
+        assert result["my_server"]["type"] == "stdio"
+        assert result["my_server"]["command"] == "uvx"
+        assert "transport" not in result["my_server"]
+
+    def test_sse_server(self):
+        config = {
+            "mcp_servers": {
+                "web_search": {
+                    "transport": "sse",
+                    "url": "http://localhost:8080/sse",
+                }
+            }
+        }
+        result = _build_mcp_servers(config)
+        assert result["web_search"]["type"] == "sse"
+        assert result["web_search"]["url"] == "http://localhost:8080/sse"
+
+    def test_multiple_servers(self):
+        config = {
+            "mcp_servers": {
+                "a": {"transport": "stdio", "command": "cmd_a"},
+                "b": {"transport": "sse", "url": "http://b:8080"},
+            }
+        }
+        result = _build_mcp_servers(config)
+        assert len(result) == 2
+        assert "a" in result
+        assert "b" in result
+
+
+class TestToolRegistry:
+    """Decision 17: TOOL_REGISTRY from tools module."""
+
+    def test_registry_has_expected_tools(self):
+        from src.archi.pipelines.copilot_agents.tools import TOOL_REGISTRY
+
+        expected = {
+            "search_vectorstore_hybrid",
+            "search_local_files",
+            "search_metadata_index",
+            "list_metadata_schema",
+            "fetch_catalog_document",
+            "monit_opensearch_search",
+            "monit_opensearch_aggregation",
+        }
+        assert expected == set(TOOL_REGISTRY.keys())
+
+    def test_registry_entries_have_factory_and_description(self):
+        from src.archi.pipelines.copilot_agents.tools import TOOL_REGISTRY
+
+        for name, entry in TOOL_REGISTRY.items():
+            assert "factory" in entry, f"{name} missing factory"
+            assert "description" in entry, f"{name} missing description"
+            assert callable(entry["factory"]), f"{name} factory not callable"
+            assert isinstance(entry["description"], str), f"{name} description not str"
+
+
+class TestToolRestrictions:
+    """SDK built-in tools must be hard-blocked for Archi sessions."""
+
+    def test_allowlist_contains_only_custom_tool_names(self):
+        """available_tools should list exactly the custom tools passed in."""
+        from unittest.mock import MagicMock
+
+        tool_a = MagicMock()
+        tool_a.name = "search_vectorstore_hybrid"
+        tool_b = MagicMock()
+        tool_b.name = "rucio_events_search"
+
+        kwargs = _build_tool_restriction_kwargs([tool_a, tool_b])
+        assert sorted(kwargs["available_tools"]) == [
+            "rucio_events_search",
+            "search_vectorstore_hybrid",
+        ]
+        assert "excluded_tools" not in kwargs
+
+    def test_empty_tools_yields_empty_allowlist(self):
+        """When no custom tools exist, available_tools is empty — blocking everything."""
+        kwargs = _build_tool_restriction_kwargs([])
+        assert kwargs["available_tools"] == []
+        assert "excluded_tools" not in kwargs
+
+
+class TestPermissionRequests:
+    """Only declared Archi custom tools should be approved."""
+
+    def _make_pipeline(self, selected_tool_names=None):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        pipeline = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        pipeline.selected_tool_names = list(selected_tool_names or [])
+        return pipeline
+
+    def test_approves_allowed_custom_tool(self):
+        from copilot.generated.session_events import (PermissionRequest,
+                                                      PermissionRequestKind)
+
+        pipeline = self._make_pipeline(["search_local_files"])
+        request = PermissionRequest(
+            kind=PermissionRequestKind.CUSTOM_TOOL, tool_name="search_local_files"
+        )
+
+        result = pipeline._on_permission_request(request, {"toolCallId": "1"})
+
+        assert result.kind == "approved"
+
+    def test_denies_builtin_shell_request(self):
+        from copilot.generated.session_events import (PermissionRequest,
+                                                      PermissionRequestKind)
+
+        pipeline = self._make_pipeline(["search_local_files"])
+        request = PermissionRequest(
+            kind=PermissionRequestKind.SHELL,
+            tool_name="bash",
+            full_command_text="pwd",
+        )
+
+        result = pipeline._on_permission_request(request, {"toolCallId": "2"})
+
+        assert result.kind == "denied"
+        assert "Only Archi custom tools" in result.message
+
+    def test_denies_custom_tool_not_in_agent_spec(self):
+        from copilot.generated.session_events import (PermissionRequest,
+                                                      PermissionRequestKind)
+
+        pipeline = self._make_pipeline(["search_local_files"])
+        request = PermissionRequest(
+            kind=PermissionRequestKind.CUSTOM_TOOL, tool_name="read_file"
+        )
+
+        result = pipeline._on_permission_request(request, {"toolCallId": "3"})
+
+        assert result.kind == "denied"
+        assert "not allowed" in result.message
+
+
+class TestGetToolRegistrySignature:
+    """get_tool_registry and get_tool_descriptions must work when called
+    via the same pattern as app.py: agent_cls.method(dummy_instance)."""
+
+    def test_get_tool_registry_instance_call(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        dummy = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        # This is how app.py calls it — must not raise
+        registry = CopilotAgentPipeline.get_tool_registry(dummy)
+        assert isinstance(registry, dict)
+        assert "search_vectorstore_hybrid" in registry
+
+    def test_get_tool_descriptions_instance_call(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        dummy = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        descriptions = CopilotAgentPipeline.get_tool_descriptions(dummy)
+        assert isinstance(descriptions, dict)
+        assert "search_vectorstore_hybrid" in descriptions
+        assert isinstance(descriptions["search_vectorstore_hybrid"], str)
+
+
+class TestSessionConfigOverrides:
+    """Bug #15/#16: per-request provider/model/api_key overrides."""
+
+    def _make_pipeline(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = "openai"
+        p.default_model = "gpt-4o"
+        p._providers_config = {}
+        p.agent_prompt = "You are a test bot"
+        p.archi_config = {}
+        return p
+
+    def test_default_provider_used_when_no_override(self):
+        p = self._make_pipeline()
+        cfg = p._build_session_config(tools=[], api_key="sk-test")
+        assert cfg["model"] == "gpt-4o"
+        assert cfg["provider"]["type"] == "openai"
+        assert cfg["provider"]["api_key"] == "sk-test"
+
+    def test_provider_override(self):
+        p = self._make_pipeline()
+        cfg = p._build_session_config(
+            tools=[],
+            api_key="ant-key",
+            provider_override="anthropic",
+            model_override="claude-sonnet-4-20250514",
+        )
+        assert cfg["model"] == "claude-sonnet-4-20250514"
+        assert cfg["provider"]["type"] == "anthropic"
+        assert cfg["provider"]["api_key"] == "ant-key"
+
+    def test_partial_override_only_model(self):
+        """If only model is overridden, provider stays default."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(
+            tools=[],
+            api_key="k",
+            model_override="gpt-4o-mini",
+        )
+        assert cfg["model"] == "gpt-4o-mini"
+        assert cfg["provider"]["type"] == "openai"
+
+    def test_partial_override_only_provider(self):
+        """If only provider is overridden, model stays default."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(
+            tools=[],
+            api_key="k",
+            provider_override="anthropic",
+        )
+        assert cfg["model"] == "gpt-4o"
+        assert cfg["provider"]["type"] == "anthropic"
+
+    def test_api_key_forwarded(self):
+        """API key is forwarded to provider dict."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(tools=[], api_key="session-key-123")
+        assert cfg["provider"]["api_key"] == "session-key-123"
+
+
+class TestSessionResume:
+    """Session resume failure should not reuse a bad session_id."""
+
+    def test_session_id_cleared_on_resume_failure(self):
+        """When resume_session() fails, the fallback create_session() should not
+        reuse the old session_id that failed."""
+        import asyncio
+        from unittest.mock import AsyncMock, MagicMock
+
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = "openai"
+        p.default_model = "gpt-4o"
+        p._providers_config = {}
+        p.agent_prompt = "test"
+        p.archi_config = {}
+        p.selected_tool_names = None
+
+        mock_client = MagicMock()
+        mock_client.resume_session = AsyncMock(
+            side_effect=Exception("session not found")
+        )
+        mock_session = MagicMock()
+        mock_client.create_session = AsyncMock(return_value=mock_session)
+        p._client = mock_client
+
+        adapter = MagicMock()
+
+        loop = asyncio.new_event_loop()
+        config = p._build_session_config(tools=[], api_key=None)
+        session = loop.run_until_complete(
+            p._create_session(adapter, config, session_id="bad-session-id")
+        )
+        loop.close()
+
+        # create_session should NOT have session_id= in its kwargs
+        call_kwargs = mock_client.create_session.call_args
+        assert "session_id" not in call_kwargs.kwargs
+        # But it should still have been called
+        mock_client.create_session.assert_called_once()
+
+
+class TestCustomizeMode:
+    """System message uses customize mode with per-section overrides."""
+
+    def _make_pipeline(self, prompt="You are a test bot"):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = "openai"
+        p.default_model = "gpt-4o"
+        p._providers_config = {}
+        p.agent_prompt = prompt
+        p.archi_config = {}
+        return p
+
+    def test_customize_mode_with_identity_section(self):
+        p = self._make_pipeline("You are a CMS computing assistant")
+        cfg = p._build_session_config(tools=[], api_key="k")
+
+        sm = cfg["system_message"]
+        assert sm["mode"] == "customize"
+        assert "sections" in sm
+        assert sm["sections"]["identity"]["action"] == "replace"
+        assert (
+            sm["sections"]["identity"]["content"] == "You are a CMS computing assistant"
+        )
+
+    def test_no_system_message_without_prompt(self):
+        p = self._make_pipeline(prompt=None)
+        cfg = p._build_session_config(tools=[], api_key="k")
+        assert "system_message" not in cfg
+
+    def test_sdk_defaults_not_overridden(self):
+        """safety, tool_efficiency, and code_change_rules should stay SDK-managed."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(tools=[], api_key="k")
+        sections = cfg["system_message"]["sections"]
+        for section in ("safety", "tool_efficiency", "code_change_rules"):
+            assert section not in sections
+
+    def test_no_history_in_system_message(self):
+        """History is no longer injected — session persistence handles it."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(tools=[], api_key="k")
+        sm = cfg["system_message"]
+        # No content key at all, just sections
+        assert "content" not in sm or "<conversation_history>" not in str(
+            sm.get("content", "")
+        )
+
+
+class TestErrorHook:
+    """onErrorOccurred hook: retry transient model errors."""
+
+    def _make_pipeline(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        return p
+
+    def test_recoverable_model_error_returns_retry(self):
+        p = self._make_pipeline()
+        result = p._on_error_occurred(
+            {
+                "error": "Rate limit exceeded",
+                "errorContext": "model_call",
+                "recoverable": True,
+                "timestamp": 1,
+                "cwd": "/",
+            }
+        )
+        assert result is not None
+        assert result["errorHandling"] == "retry"
+        assert result["retryCount"] == 2
+        assert "retry" in result["userNotification"].lower()
+
+    def test_non_recoverable_error_returns_none(self):
+        p = self._make_pipeline()
+        result = p._on_error_occurred(
+            {
+                "error": "Invalid API key",
+                "errorContext": "model_call",
+                "recoverable": False,
+                "timestamp": 1,
+                "cwd": "/",
+            }
+        )
+        assert result is None
+
+    def test_tool_execution_error_not_retried(self):
+        p = self._make_pipeline()
+        result = p._on_error_occurred(
+            {
+                "error": "Tool crashed",
+                "errorContext": "tool_execution",
+                "recoverable": True,
+                "timestamp": 1,
+                "cwd": "/",
+            }
+        )
+        # Only model_call errors are retried
+        assert result is None
+
+    def test_system_error_not_retried(self):
+        p = self._make_pipeline()
+        result = p._on_error_occurred(
+            {
+                "error": "System error",
+                "errorContext": "system",
+                "recoverable": True,
+                "timestamp": 1,
+                "cwd": "/",
+            }
+        )
+        assert result is None
diff --git a/tests/unit/test_import_sanity.py b/tests/unit/test_import_sanity.py
new file mode 100644
index 000000000..954b5d117
--- /dev/null
+++ b/tests/unit/test_import_sanity.py
@@ -0,0 +1,37 @@
+"""Import sanity test — regression for bug #8 (circular import).
+
+Verifies that the core modules can be imported without circular
+dependency errors in a clean Python process.
+"""
+
+import subprocess
+import sys
+
+import pytest
+
+
+class TestNoCircularImports:
+    """Regression for bug #8: circular import between modules."""
+
+    @pytest.mark.parametrize(
+        "module",
+        [
+            "src.archi.archi",
+            "src.archi.pipelines.copilot_agents.copilot_event_adapter",
+            "src.archi.utils.output_dataclass",
+            "src.archi.pipelines.agents.tools.local_files",
+        ],
+    )
+    def test_module_imports_cleanly(self, module):
+        """Each module must be importable without ImportError in a clean process."""
+        result = subprocess.run(
+            [sys.executable, "-c", f"import {module}"],
+            capture_output=True,
+            text=True,
+            timeout=15,
+        )
+        assert result.returncode == 0, (
+            f"Failed to import {module}:\n"
+            f"stdout: {result.stdout}\n"
+            f"stderr: {result.stderr}"
+        )
diff --git a/tests/unit/test_pipeline_integration.py b/tests/unit/test_pipeline_integration.py
new file mode 100644
index 000000000..76e47c8ec
--- /dev/null
+++ b/tests/unit/test_pipeline_integration.py
@@ -0,0 +1,374 @@
+"""Layer 2 integration tests: pipeline → adapter → output flow.
+
+Tests the wiring between CopilotAgentPipeline, CopilotEventAdapter,
+and Archi.stream() without requiring a real Copilot SDK session.
+"""
+
+import asyncio
+import concurrent.futures
+import queue
+import threading
+from unittest.mock import AsyncMock, MagicMock, PropertyMock, patch
+
+import pytest
+
+from src.archi.pipelines.copilot_agents.copilot_event_adapter import _SENTINEL, CopilotEventAdapter
+from src.archi.utils.output_dataclass import PipelineOutput
+
+# ── Helpers ───────────────────────────────────────────────────────────────
+
+
+class FakeAsyncLoop:
+    """Stub for AsyncLoopThread that runs coroutines inline."""
+
+    def run(self, coro, timeout=5.0):
+        loop = asyncio.new_event_loop()
+        try:
+            return loop.run_until_complete(coro)
+        finally:
+            loop.close()
+
+    def run_no_wait(self, coro):
+        loop = asyncio.new_event_loop()
+        future = asyncio.run_coroutine_threadsafe(coro, loop)
+        return future
+
+    def submit(self, coro):
+        """Schedule coro and return immediately."""
+        loop = asyncio.new_event_loop()
+        return asyncio.run_coroutine_threadsafe(coro, loop)
+
+
+class ImmediateAsyncLoop:
+    """Run background coroutines synchronously and return a settled Future."""
+
+    def run_no_wait(self, coro):
+        future = concurrent.futures.Future()
+        loop = asyncio.new_event_loop()
+        try:
+            result = loop.run_until_complete(coro)
+            future.set_result(result)
+        except Exception as exc:
+            future.set_exception(exc)
+        finally:
+            loop.close()
+        return future
+
+
+# ── Archi.stream() passthrough tests ──────────────────────────────────────
+
+
+class TestArchiStreamPassthrough:
+    """Verify kwargs flow from Archi.stream() → pipeline.stream()."""
+
+    def test_kwargs_forwarded_to_pipeline(self):
+        """All kwargs including provider/model/api_key reach the pipeline."""
+        from src.archi.archi import archi as ArchiClass
+
+        mock_pipeline = MagicMock()
+        mock_pipeline.stream = MagicMock(
+            return_value=iter(
+                [
+                    PipelineOutput(
+                        answer="ok", metadata={"event_type": "final"}, final=True
+                    ),
+                ]
+            )
+        )
+
+        instance = ArchiClass.__new__(ArchiClass)
+        instance.pipeline = mock_pipeline
+        instance.pipeline_name = "test"
+        instance.vs_connector = MagicMock()
+        instance.vs_connector.get_vectorstore.return_value = MagicMock()
+
+        results = list(
+            instance.stream(
+                history=[("user", "hi")],
+                conversation_id=1,
+                provider="anthropic",
+                model="claude-sonnet-4-20250514",
+                provider_api_key="sk-test",
+            )
+        )
+
+        assert len(results) == 1
+        call_kwargs = mock_pipeline.stream.call_args.kwargs
+        assert call_kwargs["provider"] == "anthropic"
+        assert call_kwargs["model"] == "claude-sonnet-4-20250514"
+        assert call_kwargs["provider_api_key"] == "sk-test"
+        assert call_kwargs["history"] == [("user", "hi")]
+        # vectorstore is injected by _prepare_call_kwargs
+        assert "vectorstore" in call_kwargs
+
+    def test_pipeline_output_type_enforced(self):
+        """Archi.stream() raises TypeError if pipeline yields non-PipelineOutput."""
+        from src.archi.archi import archi as ArchiClass
+
+        mock_pipeline = MagicMock()
+        mock_pipeline.stream = MagicMock(return_value=iter(["bad string"]))
+
+        instance = ArchiClass.__new__(ArchiClass)
+        instance.pipeline = mock_pipeline
+        instance.pipeline_name = "test"
+        instance.vs_connector = MagicMock()
+        instance.vs_connector.get_vectorstore.return_value = MagicMock()
+
+        with pytest.raises(TypeError, match="PipelineOutput"):
+            list(instance.stream(history=[]))
+
+
+# ── Pipeline session config integration tests ─────────────────────────────
+
+
+class TestPipelineSessionConfig:
+    """Verify _build_session_config produces correct SDK config."""
+
+    def _make_pipeline(self, **overrides):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = overrides.get("provider", "openai")
+        p.default_model = overrides.get("model", "gpt-4o")
+        p._providers_config = overrides.get("providers_config", {})
+        p.agent_prompt = overrides.get("prompt", "Test prompt")
+        p.archi_config = overrides.get("archi_config", {})
+        return p
+
+    def test_system_message_uses_customize_mode(self):
+        p = self._make_pipeline(prompt="You are helpful")
+        cfg = p._build_session_config(
+            tools=[],
+            api_key="k",
+        )
+        sys_msg = cfg["system_message"]
+        assert sys_msg["mode"] == "customize"
+        assert sys_msg["sections"]["identity"]["action"] == "replace"
+        assert "You are helpful" in sys_msg["sections"]["identity"]["content"]
+
+    def test_mcp_servers_included_when_configured(self):
+        p = self._make_pipeline(
+            archi_config={
+                "mcp_servers": {
+                    "test_server": {"transport": "stdio", "command": "test-cmd"}
+                }
+            }
+        )
+        cfg = p._build_session_config(tools=[], api_key="k")
+        assert "mcp_servers" in cfg
+        assert "test_server" in cfg["mcp_servers"]
+
+    def test_tools_in_special_key(self):
+        """Tools go in _tools key, popped by _create_session."""
+        p = self._make_pipeline()
+        fake_tools = [MagicMock(), MagicMock()]
+        cfg = p._build_session_config(tools=fake_tools, api_key="k")
+        assert cfg["_tools"] == fake_tools
+
+
+# ── Adapter → Pipeline output flow ───────────────────────────────────────
+
+
+class TestAdapterOutputFlow:
+    """Test that adapter produces correct PipelineOutput sequence."""
+
+    def test_text_then_done_produces_outputs(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+
+        # Simulate: text chunk, then signal done
+        adapter._queue.put(
+            PipelineOutput(
+                answer="Hello ",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        adapter._queue.put(
+            PipelineOutput(
+                answer="world",
+                metadata={"event_type": "text"},
+                final=False,
+            )
+        )
+        adapter.signal_done()
+
+        outputs = list(adapter.iter_outputs(poll_timeout=2.0))
+        assert len(outputs) == 2
+        assert outputs[0].answer == "Hello "
+        assert outputs[1].answer == "world"
+
+    def test_build_final_aggregates_tool_calls(self):
+        from src.archi.pipelines.copilot_agents.copilot_event_adapter import _ToolCallRecord
+
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._response_buffer = "Final answer"
+        adapter._tool_calls = [
+            _ToolCallRecord(
+                id="tc-1", name="search_vectorstore_hybrid", args={"query": "test"}
+            ),
+            _ToolCallRecord(
+                id="tc-2", name="fetch_catalog_document", args={"url": "http://x"}
+            ),
+        ]
+        adapter._tool_calls[0].result = "found docs"
+        adapter._tool_calls[1].result = "page content"
+        adapter._usage = {"prompt_tokens": 100, "completion_tokens": 50}
+
+        final = adapter.build_final_output(source_documents=["doc1"])
+        assert final.answer == "Final answer"
+        assert final.final is True
+        assert final.source_documents == ["doc1"]
+        assert final.metadata["event_type"] == "final"
+        assert len(final.metadata["tool_calls"]) == 2
+        assert final.metadata["tool_calls"][0]["name"] == "search_vectorstore_hybrid"
+        assert final.metadata["usage"]["prompt_tokens"] == 100
+
+    def test_build_final_no_usage_omits_key(self):
+        adapter = CopilotEventAdapter(FakeAsyncLoop())
+        adapter._response_buffer = "answer"
+        final = adapter.build_final_output()
+        assert "usage" not in final.metadata
+
+
+# ── Stream kwargs extraction ──────────────────────────────────────────────
+
+
+class TestStreamKwargsExtraction:
+    """Verify CopilotAgentPipeline.stream() correctly extracts per-request
+    overrides. Tests the setup logic without running the async session."""
+
+    def _make_pipeline(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = "openai"
+        p.default_model = "gpt-4o"
+        p._providers_config = {}
+        p.agent_prompt = "test"
+        p.archi_config = {}
+        p.dm_config = {}
+        p._catalog_client = None
+        p._monit_client = None
+        p.selected_tool_names = []
+        p._async_loop = FakeAsyncLoop()
+        return p
+
+    def test_session_api_key_preferred_over_db_key(self):
+        """Session-provided API key takes precedence over DB-stored key."""
+        p = self._make_pipeline()
+
+        with (
+            patch.object(p, "_resolve_byok_key", return_value="db-stored-key"),
+            patch.object(p, "_build_tools", return_value=[]),
+            patch.object(
+                p, "_build_session_config", return_value={"_tools": []}
+            ) as mock_cfg,
+        ):
+            # Call _build_session_config indirectly by simulating stream setup
+            # We replicate the key extraction logic from stream()
+            session_api_key = "session-key"
+            db_key = p._resolve_byok_key("u1")
+            api_key = session_api_key or db_key
+            assert api_key == "session-key"
+
+    def test_db_key_used_when_no_session_key(self):
+        """Without session API key, falls back to DB-stored key."""
+        p = self._make_pipeline()
+
+        with patch.object(p, "_resolve_byok_key", return_value="db-key"):
+            session_api_key = None
+            db_key = p._resolve_byok_key("u1")
+            api_key = session_api_key or db_key
+            assert api_key == "db-key"
+
+    def test_provider_model_override_to_config(self):
+        """Provider/model overrides reach _build_session_config correctly."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(
+            tools=[],
+            api_key="k",
+            provider_override="anthropic",
+            model_override="claude-sonnet-4-20250514",
+        )
+        assert cfg["model"] == "claude-sonnet-4-20250514"
+        assert cfg["provider"]["type"] == "anthropic"
+
+    def test_no_override_uses_defaults(self):
+        """Without overrides, defaults are used."""
+        p = self._make_pipeline()
+        cfg = p._build_session_config(tools=[], api_key="k")
+        assert cfg["model"] == "gpt-4o"
+        assert cfg["provider"]["type"] == "openai"
+
+
+class TestCopilotSessionPersistence:
+    """Verify persisted SDK session IDs flow through the pipeline."""
+
+    def test_stream_uses_persisted_session_id_and_returns_active_id(self):
+        from src.archi.pipelines.copilot_agents.copilot_agent import CopilotAgentPipeline
+
+        class FakeSession:
+            session_id = "sdk-session-created"
+
+            async def send_and_wait(self, *_args, **_kwargs):
+                return None
+
+        class FakeAdapter:
+            def __init__(self, _async_loop):
+                self._session = None
+
+            def attach_to_session(self, session):
+                self._session = session
+
+            def iter_outputs(self):
+                return iter([])
+
+            def signal_done(self):
+                return None
+
+            def build_final_output(
+                self, *, source_documents=None, retriever_scores=None
+            ):
+                return PipelineOutput(
+                    answer="ok",
+                    source_documents=source_documents or [],
+                    metadata={"event_type": "final"},
+                    final=True,
+                )
+
+        p = CopilotAgentPipeline.__new__(CopilotAgentPipeline)
+        p.default_provider = "openai"
+        p.default_model = "gpt-4o"
+        p._providers_config = {}
+        p.agent_prompt = "test"
+        p.archi_config = {}
+        p.dm_config = {}
+        p._catalog_client = None
+        p._monit_client = None
+        p.selected_tool_names = []
+        p._async_loop = ImmediateAsyncLoop()
+        p._build_tools = MagicMock(return_value=[])
+        p._resolve_byok_key = MagicMock(return_value=None)
+
+        captured = {}
+
+        async def fake_create_session(adapter, config, *, session_id=None):
+            captured["session_id"] = session_id
+            return FakeSession(), True
+
+        p._create_session = fake_create_session
+
+        with patch(
+            "src.archi.pipelines.copilot_agents.copilot_agent.CopilotEventAdapter", FakeAdapter
+        ):
+            outputs = list(
+                p.stream(
+                    history=[("user", "Hello")],
+                    conversation_id=42,
+                    pipeline_session_id="sdk-session-existing",
+                    vectorstore=None,
+                )
+            )
+
+        assert captured["session_id"] == "sdk-session-existing"
+        assert outputs[-1].metadata["pipeline_session_id"] == "sdk-session-created"
diff --git a/tests/unit/test_ticket_manager.py b/tests/unit/test_ticket_manager.py
new file mode 100644
index 000000000..d6dc74afe
--- /dev/null
+++ b/tests/unit/test_ticket_manager.py
@@ -0,0 +1,149 @@
+"""Unit tests for TicketManager._collect_from_client error handling.
+
+Regression for bug #11: generator iteration outside try/except
+caused AuthError from Redmine to crash the entire ingestion pipeline.
+"""
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+class TestCollectFromClientErrorHandling:
+    """Verify _collect_from_client handles errors during generator iteration."""
+
+    def _make_manager(self):
+        """Create a TicketManager with mocked config dependencies."""
+        with patch(
+            "src.data_manager.collectors.tickets.ticket_manager.get_global_config"
+        ) as mock_gc:
+            mock_gc.return_value = {"DATA_PATH": "/tmp/test_data"}
+            with patch("src.data_manager.collectors.tickets.ticket_manager.JiraClient"):
+                with patch(
+                    "src.data_manager.collectors.tickets.ticket_manager.RedmineClient"
+                ):
+                    from src.data_manager.collectors.tickets.ticket_manager import \
+                        TicketManager
+
+                    dm_config = {
+                        "sources": {
+                            "jira": {"enabled": False},
+                            "redmine": {"enabled": False},
+                        }
+                    }
+                    return TicketManager(dm_config=dm_config)
+
+    def test_generator_exception_caught_gracefully(self):
+        """If client.collect() returns a generator that raises mid-iteration,
+        the error must be caught and logged, not propagated."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        # Create a client whose collect() returns a generator that raises
+        mock_client = MagicMock()
+        call_count = 0
+
+        def failing_generator(**kwargs):
+            nonlocal call_count
+            yield MagicMock()  # First resource succeeds
+            call_count += 1
+            raise Exception("AuthError: Invalid API key")
+
+        mock_client.collect.return_value = failing_generator()
+
+        # Should NOT raise — the exception should be caught
+        manager._collect_from_client(
+            mock_client,
+            "Redmine",
+            persistence=persistence,
+            overwrite=False,
+            projects=["test-project"],
+        )
+
+        # First resource should have been persisted
+        assert persistence.persist_resource.call_count == 1
+
+    def test_immediate_exception_caught(self):
+        """If client.collect() raises immediately (not a generator),
+        the error should still be caught."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        mock_client = MagicMock()
+        mock_client.collect.side_effect = ConnectionError("connection refused")
+
+        # Should NOT raise
+        manager._collect_from_client(
+            mock_client,
+            "JIRA",
+            persistence=persistence,
+            overwrite=False,
+            projects=["test-project"],
+        )
+        assert persistence.persist_resource.call_count == 0
+
+    def test_successful_collection(self):
+        """Verify normal collection works: all resources persisted."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        mock_client = MagicMock()
+        resources = [MagicMock(), MagicMock(), MagicMock()]
+        mock_client.collect.return_value = iter(resources)
+
+        manager._collect_from_client(
+            mock_client,
+            "JIRA",
+            persistence=persistence,
+            overwrite=False,
+            projects=["proj-a", "proj-b"],
+        )
+        assert persistence.persist_resource.call_count == 3
+
+    def test_none_client_skipped(self):
+        """If client is None, nothing should happen."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        manager._collect_from_client(
+            None,
+            "Redmine",
+            persistence=persistence,
+            overwrite=False,
+            projects=["test"],
+        )
+        assert persistence.persist_resource.call_count == 0
+
+    def test_jira_projects_tracked(self):
+        """After JIRA collection, projects should be added to jira_projects set."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        mock_client = MagicMock()
+        mock_client.collect.return_value = iter([])
+
+        manager._collect_from_client(
+            mock_client,
+            "JIRA",
+            persistence=persistence,
+            overwrite=False,
+            projects=["my-project"],
+        )
+        assert "my-project" in manager.jira_projects
+
+    def test_redmine_projects_tracked(self):
+        """After Redmine collection, projects should be added to redmine_projects set."""
+        manager = self._make_manager()
+        persistence = MagicMock()
+
+        mock_client = MagicMock()
+        mock_client.collect.return_value = iter([])
+
+        manager._collect_from_client(
+            mock_client,
+            "Redmine",
+            persistence=persistence,
+            overwrite=False,
+            projects=["infra-project"],
+        )
+        assert "infra-project" in manager.redmine_projects
diff --git a/tests/unit/test_tool_error_handling.py b/tests/unit/test_tool_error_handling.py
new file mode 100644
index 000000000..d2f960f87
--- /dev/null
+++ b/tests/unit/test_tool_error_handling.py
@@ -0,0 +1,240 @@
+"""Unit tests for tool error handling.
+
+Tests RemoteCatalogClient redirect detection (regression for bug #12),
+HTTP error handling, and timeouts.
+"""
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+import requests
+
+from src.archi.pipelines.agents.tools.local_files import RemoteCatalogClient
+
+# ── Helpers ───────────────────────────────────────────────────────────────
+
+
+def _make_response(status_code, *, json_data=None, headers=None, is_redirect=False):
+    """Create a mock requests.Response."""
+    resp = MagicMock(spec=requests.Response)
+    resp.status_code = status_code
+    resp.is_redirect = is_redirect
+    resp.headers = headers or {}
+    if json_data is not None:
+        resp.json.return_value = json_data
+    resp.raise_for_status = MagicMock()
+    if status_code >= 400:
+        resp.raise_for_status.side_effect = requests.HTTPError(
+            response=resp, request=MagicMock()
+        )
+    return resp
+
+
+# ── Tests: Redirect detection ────────────────────────────────────────────
+
+
+class TestRedirectDetection:
+    """Regression for bug #12: catalog API returning 302 → login page
+    was silently parsed as JSON, causing a confusing error."""
+
+    def test_search_302_raises_runtime_error(self):
+        """302 redirect on search must raise RuntimeError with clear message."""
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(
+            302, headers={"Location": "http://test:7871/login"}, is_redirect=True
+        )
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(RuntimeError, match="redirected.*DM_API_TOKEN"):
+                client.search("test query")
+
+    def test_search_301_raises_runtime_error(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(301, headers={"Location": "/login"}, is_redirect=False)
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(RuntimeError, match="redirected"):
+                client.search("test query")
+
+    def test_search_307_raises_runtime_error(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(307, headers={"Location": "/auth"}, is_redirect=True)
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(RuntimeError, match="redirected"):
+                client.search("test query")
+
+    def test_get_document_302_raises(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(302, headers={"Location": "/login"}, is_redirect=True)
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(RuntimeError, match="redirected"):
+                client.get_document("abc123")
+
+    def test_schema_302_raises(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(302, headers={"Location": "/login"}, is_redirect=True)
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(RuntimeError, match="redirected"):
+                client.schema()
+
+
+# ── Tests: Successful responses ──────────────────────────────────────────
+
+
+class TestSuccessfulResponses:
+    """Verify normal operations continue to work."""
+
+    def test_search_200_returns_hits(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(
+            200, json_data={"hits": [{"hash": "abc", "path": "/test.md"}]}
+        )
+
+        with patch("requests.get", return_value=resp):
+            results = client.search("test")
+
+        assert len(results) == 1
+        assert results[0]["hash"] == "abc"
+
+    def test_search_200_empty_hits(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(200, json_data={"hits": []})
+
+        with patch("requests.get", return_value=resp):
+            results = client.search("nothing")
+
+        assert results == []
+
+    def test_get_document_200(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(200, json_data={"text": "hello", "metadata": {}})
+
+        with patch("requests.get", return_value=resp):
+            result = client.get_document("abc123")
+
+        assert result["text"] == "hello"
+
+    def test_get_document_404_returns_none(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(404)
+
+        with patch("requests.get", return_value=resp):
+            result = client.get_document("missing")
+
+        assert result is None
+
+    def test_schema_200(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(
+            200, json_data={"keys": ["source_type"], "source_types": ["git"]}
+        )
+
+        with patch("requests.get", return_value=resp):
+            result = client.schema()
+
+        assert "keys" in result
+
+
+# ── Tests: HTTP errors ───────────────────────────────────────────────────
+
+
+class TestHTTPErrors:
+    """Verify proper error propagation for HTTP failures."""
+
+    def test_search_500_raises(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        resp = _make_response(500)
+
+        with patch("requests.get", return_value=resp):
+            with pytest.raises(requests.HTTPError):
+                client.search("test")
+
+    def test_search_timeout_raises(self):
+        client = RemoteCatalogClient(base_url="http://test:7871", timeout=0.1)
+
+        with patch("requests.get", side_effect=requests.Timeout("timed out")):
+            with pytest.raises(requests.Timeout):
+                client.search("test")
+
+    def test_connection_error_raises(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+
+        with patch("requests.get", side_effect=requests.ConnectionError("refused")):
+            with pytest.raises(requests.ConnectionError):
+                client.search("test")
+
+
+# ── Tests: Client construction ───────────────────────────────────────────
+
+
+class TestClientConstruction:
+    """Verify RemoteCatalogClient configuration."""
+
+    def test_api_token_sets_auth_header(self):
+        client = RemoteCatalogClient(base_url="http://test:7871", api_token="secret")
+        assert client._headers["Authorization"] == "Bearer secret"
+
+    def test_no_api_token_no_auth_header(self):
+        client = RemoteCatalogClient(base_url="http://test:7871")
+        assert "Authorization" not in client._headers
+
+    def test_host_mode_uses_localhost(self):
+        with patch.dict("os.environ", {"HOST_MODE": "true"}):
+            client = RemoteCatalogClient(port=7871)
+        assert "localhost" in client.base_url
+
+    def test_non_host_mode_uses_data_manager(self):
+        with patch.dict("os.environ", {}, clear=True):
+            # Clear all HOST_MODE variants
+            import os
+
+            for key in ["HOST_MODE", "HOSTMODE", "ARCHI_HOST_MODE"]:
+                os.environ.pop(key, None)
+            client = RemoteCatalogClient(port=7871)
+        assert "data-manager" in client.base_url
+
+    @patch(
+        "src.archi.pipelines.agents.tools.local_files.read_secret",
+        return_value="dm-token-123",
+    )
+    def test_from_deployment_config(self, mock_read_secret):
+        config = {
+            "host_mode": True,
+            "services": {
+                "data_manager": {
+                    "port": 7871,
+                }
+            },
+        }
+        client = RemoteCatalogClient.from_deployment_config(config)
+        assert client._headers.get("Authorization") == "Bearer dm-token-123"
+
+
+# ── Tests: Tool factory error handling ───────────────────────────────────
+
+
+class TestToolFactoryErrorHandling:
+    """Verify tool functions handle catalog errors gracefully."""
+
+    def test_file_search_tool_catches_catalog_exception(self):
+        """When catalog.search() raises, the tool should return an error
+        message string, not propagate the exception."""
+        client = MagicMock(spec=RemoteCatalogClient)
+        client.search.side_effect = RuntimeError("redirected to /login")
+
+        # Import the tool factory
+        from src.archi.pipelines.agents.tools.local_files import \
+            create_file_search_tool
+
+        tool = create_file_search_tool(client)
+
+        # LangChain tool.invoke() should return error message
+        result = tool.invoke({"query": "test"})
+        assert (
+            "failed" in result.lower()
+            or "error" in result.lower()
+            or "unavailable" in result.lower()
+        )