Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5527812
replace agent runtime with Copilot SDK
JasonMoho Mar 23, 2026
def1de4
add unit tests for Copilot SDK adapter and pipeline helpers
JasonMoho Mar 23, 2026
cb9a79c
fix critical bugs in copilot SDK integration
JasonMoho Mar 24, 2026
bc19330
fix tool failures: generator error handling + catalog auth redirect d…
JasonMoho Mar 24, 2026
1a07e6f
copilot sdk: fix tool restrictions, session resume, event adapter edg…
JasonMoho Mar 27, 2026
ecf92f0
add unit, playwright, and smoke tests for copilot sdk pipeline
JasonMoho Mar 27, 2026
2b7c4fa
fix post-merge regressions: tests and production bugs
JasonMoho Mar 27, 2026
78ff9ab
fix tool name/args showing as unknown in UI
JasonMoho Mar 27, 2026
c7669ff
security: switch SDK tool restrictions to allowlist, fix multi-turn h…
JasonMoho Mar 27, 2026
6fb9157
tests: add copilot SDK pipeline test suite
JasonMoho Mar 27, 2026
0d3948e
security: switch to tool allowlist, fix multi-turn history, fix sessi…
JasonMoho Mar 27, 2026
1051052
revert formatting-only changes and remove unrelated test_ticket_manager
JasonMoho Mar 27, 2026
96529f6
revert remaining format-only changes, keep functional fixes
JasonMoho Mar 28, 2026
f192dfc
revert formatting noise from app.py, api.py, base_react.py, cli_main.…
JasonMoho Mar 28, 2026
ed0e3d3
update PR description to match actual diff scope
JasonMoho Mar 28, 2026
f0678d3
remove PR description from tracked files
JasonMoho Mar 28, 2026
d8531d2
fix tools_smoke: @define_tool returns Tool, not callable
JasonMoho Mar 28, 2026
0f20b72
add CopilotAgentPipeline smoke test to CI
JasonMoho Mar 28, 2026
e673d01
fix DocumentCollector.store -> store_docs in tools_smoke.py
JasonMoho Mar 28, 2026
8341360
make Copilot SDK smoke step non-fatal (requires auth)
JasonMoho Mar 28, 2026
be522e5
harmonize session ID
lucalavezzo Mar 31, 2026
3ce245b
address PR review quick fixes
JasonMoho Mar 31, 2026
0e3e27e
move copilot SDK files to pipelines/copilot_agents/, remove slop prompt
JasonMoho Mar 31, 2026
ee8fdb8
drop tool name alias, unify retriever helpers, trim slop tests
JasonMoho Mar 31, 2026
c7d767b
keep search_vectorstore_hybrid as canonical tool name
JasonMoho Mar 31, 2026
63a62a9
trigger CI
JasonMoho Mar 31, 2026
0d03288
fix rebase conflicts: remove stale Gemini ref, update tests for threa…
JasonMoho Mar 31, 2026
9542d81
fix insert_conversation signature mismatch from rebase
JasonMoho Apr 1, 2026
c35e18d
restore current_model_used/current_pipeline_used attrs on ChatWrapper
JasonMoho Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions .github/workflows/pr-preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
python -m pip install --upgrade pip
pip install . || true
pip install -r requirements/requirements-base.txt
pip install pytest
pip install pytest pytest-asyncio

- name: Run unit tests
run: python -m pytest tests/unit/ -v --tb=short
Expand Down Expand Up @@ -255,12 +255,39 @@ jobs:
path: playwright-report/
retention-days: 14

# ── Cleanup ─────────────────────────────────────────────────────────
- name: Cleanup smoke deployment
# ── Copilot SDK smoke (BYOK via local Ollama) ─────────────────────
# NOTE: The Copilot SDK requires GitHub auth even in BYOK mode.
# This step validates build + boot but react_smoke will time out
# until CI has a Copilot-authenticated token. Non-fatal for now.
- name: Cleanup CMSCompOps deployment before Copilot smoke
if: ${{ always() }}
run: |
yes | archi delete --name ci-${{ github.run_id }} || true

- name: Run Copilot SDK smoke deployment
continue-on-error: true
uses: ./.github/actions/run-smoke
with:
deployment-name: ci-copilot-${{ github.run_id }}
config-path: tests/pr_preview_config/pr_preview_copilot_config.yaml
config-destination: configs/ci/ci_copilot_config.generated.yaml
services: chatbot
hostmode: "true"
wait-url: http://localhost:2786/api/health
base-url: http://localhost:2786
extra-env: |
ARCHI_COMPOSE_UP_FLAGS=--build --force-recreate
SMOKE_OLLAMA_MODEL=qwen3:4b
SMOKE_OLLAMA_URL=http://localhost:11434
SMOKE_OLLAMA_HOST=http://localhost:11434
use-podman: "false"

# ── Cleanup ─────────────────────────────────────────────────────────
- name: Cleanup smoke deployments
if: ${{ always() }}
run: |
yes | archi delete --name ci-copilot-${{ github.run_id }} || true

- name: Cleanup local base images
if: ${{ always() && needs.build-base-images.outputs.changed == 'true' }}
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@ git/
local_files/
raw_local_files/
websites/
.env_tmp_smoke
89 changes: 89 additions & 0 deletions docs/multi-backend-agent-recommendation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Multi-Backend Agent Abstraction: Recommendation

**Date:** March 25, 2026
**Question:** Should A2rchi support a general agent backend (Copilot SDK, Claude Agent SDK, LangChain) or lock into the Copilot SDK?

**Verdict: Lock into Copilot SDK. A general abstraction is feasible but not advisable.**

## Side-by-Side Comparison

| Dimension | Copilot SDK | Claude Agent SDK | LangChain |
|---|---|---|---|
| **Runtime** | CLI subprocess (`copilot --headless`) | CLI subprocess (`claude` CLI) | In-process graph |
| **Tool definition** | `defineTool(name, {description, parameters: JSONSchema, handler})` | `@tool(name, desc, schema)` → must return MCP `{"content": [...]}` | `@tool` decorator, returns `str` |
| **Streaming** | Event callbacks: `session.on("event_type", handler)` | Async iterator: `async for msg in query()` | State generator: `for chunk in agent.stream()` |
| **Session** | `createSession()` → `sendAndWait()`, managed by CLI | `query()` (stateless) or `ClaudeSDKClient` (sessioned), managed by CLI | `invoke(state)`, state is external (you manage it) |
| **Models** | GPT-4.1 default, BYOK for OpenAI/Azure/Anthropic/Google/Mistral | Claude only, BYOK via Bedrock/Vertex/Azure AI Foundry | Any provider via `init_chat_model()` |
| **Hooks** | `onPreToolUse`, `onPostToolUse`, session lifecycle | `PreToolUse`, `PostToolUse`, `PermissionRequest`, etc. | Middleware: `@before_model`, `@after_model`, `@wrap_tool_call` |
| **Auth** | GitHub OAuth, env vars, BYOK | Anthropic API key, Bedrock, Vertex | Per-model provider keys |

## Key Issues With a General Abstraction

### 1. Tool return format mismatch

Claude Agent SDK enforces MCP wire format — tools must return `{"content": [{"type": "text", "text": "..."}]}`. Copilot tools return any serializable value. LangChain tools return strings. Every tool needs a per-backend wrapper that normalizes both input schemas and return formats. Our 7 tools become 21 adapter functions.

### 2. Both Copilot and Claude SDKs are CLI wrappers

They spawn a subprocess and communicate over stdio/TCP. LangChain runs fully in-process. This means:

- Two separate CLI binaries in your Docker image
- Two different auth flows (GitHub OAuth vs Anthropic API key)
- Two different process lifecycle managers
- LangChain requires none of this (but has completely different plumbing)

### 3. Three incompatible streaming models

Our existing `copilot_event_adapter.py` is ~400 lines that translate Copilot's event callbacks into `PipelineOutput` objects. We'd need an equivalent adapter for each backend — each handling different event types, different data shapes, different async patterns (callbacks vs async iterators vs sync generators).

### 4. Claude Agent SDK BYOK is provider-level, not model-level

The Claude Agent SDK does support BYOK via Amazon Bedrock, Google Vertex AI, and Microsoft Azure AI Foundry. But this means "bring your own cloud credentials to access **Claude models**" — not "bring your own key to use any model." You're still restricted to Claude (Sonnet, Opus, Haiku). Copilot SDK's BYOK lets you swap between entirely different model families (GPT-4.1, Claude, Gemini, Mistral). A2rchi's multi-provider model selection would not work through the Claude Agent SDK.

### 5. Session lifecycle is fundamentally different

Copilot and Claude manage sessions inside their CLI process (persist, resume, fork). LangChain has no built-in session — you provide state via checkpointers. Abstracting over "session" means accepting the lowest common denominator: no resume, no persistence, no fork.

### 6. LCD strips unique value from each SDK

- **Copilot:** Custom agents, skills, system message section overrides (replace/remove/append per section) — can't express through an abstraction
- **Claude:** Permission system, sandbox, file checkpointing, subagents — not available in others
- **LangChain:** Middleware pipeline, dynamic model selection, structured output strategies — completely different paradigm

## The Math

Each additional backend requires:

| Component | LOC |
|---|---|
| Event/streaming adapter | ~400 |
| Tool wrappers (7 tools × format normalization) | ~200 |
| Session lifecycle management | ~300 |
| Auth/config integration | ~150 |
| **Total per backend** | **~1,050** |

Plus ongoing maintenance when any SDK ships breaking changes.

## Why the Architecture Already Supports a Future Pivot

The current architecture is already well-separated:

- **`archi.py`** is 100% backend-agnostic — it calls `pipeline.stream()` and validates `PipelineOutput`
- The pipeline factory (`getattr(archiPipelines, class_name)`) lets you add a `LangChainAgentPipeline` or `ClaudeAgentPipeline` as a new pipeline class without touching any shared code
- **`PipelineOutput`** is the universal contract — any new backend just needs to yield these

No premature abstraction layer needed. When the time comes, you add a new pipeline class.

## If You Ever Need a Second Backend

**LangChain is the better addition** (not Claude Agent SDK) because:

1. It runs in-process (no CLI dependency)
2. It supports any model provider
3. Its `@tool` decorator is closest to Copilot's `defineTool`

But even then, it's ~1,000+ LOC of glue code for marginal value — the same users who want "Anthropic models" already get them through Copilot SDK's BYOK.

## Recommendation

Stay on the Copilot SDK. Build the second backend only when a concrete use case demands it — the architecture is ready.
2 changes: 2 additions & 0 deletions examples/agents/cms-comp-ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ tools:
- search_vectorstore_hybrid
- search_local_files
- search_metadata_index
- list_metadata_schema
- fetch_catalog_document
---

You are the CMS Comp Ops assistant. You help with operational questions, troubleshooting,
Expand Down
5 changes: 3 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "archi"
version = "1.2.4"
description = "An AI Augmented Research Chat Intelligence (archi)"
requires-python = ">=3.7"
requires-python = ">=3.10"
authors = [
{name="Pietro Lugato", email="pmlugato@mit.edu"},
{name="Julius Heitkoetter", email="juliush@mit.edu"},
Expand All @@ -14,7 +14,7 @@ authors = [
]
dependencies = [
"pyyaml==6.0.1",
"click==8.1.7",
"click>=8.1.7",
"jinja2==3.1.3",
"requests==2.31.0",
"podman-compose==1.4.0",
Expand Down Expand Up @@ -48,6 +48,7 @@ build-backend = "setuptools.build_meta"
[tool.pytest.ini_options]
testpaths = ["tests/unit"]
addopts = "-v --tb=short"
asyncio_mode = "auto"

[project.urls]
"Homepage" = "https://github.com/archi-physics/archi"
1 change: 1 addition & 0 deletions requirements/requirements-base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ httptools==0.6.1
httpx==0.27.2
humanfriendly==10.0
croniter==2.0.5
github-copilot-sdk>=0.2.0
langgraph==1.0.2
langchain-mcp-adapters==0.1.11
langchain==1.0.3
Expand Down
6 changes: 4 additions & 2 deletions src/archi/pipelines/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
"""Pipeline package exposing the available pipeline classes."""

from .agents.base_react import BaseReActAgent
from .agents.cms_comp_ops_agent import CMSCompOpsAgent
from .classic_pipelines.base import BasePipeline
from .classic_pipelines.grading import GradingPipeline
from .classic_pipelines.image_processing import ImageProcessingPipeline
from .classic_pipelines.qa import QAPipeline
from .agents.base_react import BaseReActAgent
from .agents.cms_comp_ops_agent import CMSCompOpsAgent
from .copilot_agents.copilot_agent import CopilotAgentPipeline

__all__ = [
"BasePipeline",
Expand All @@ -14,4 +15,5 @@
"QAPipeline",
"BaseReActAgent",
"CMSCompOpsAgent",
"CopilotAgentPipeline",
]
6 changes: 5 additions & 1 deletion src/archi/pipelines/agents/base_react.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from src.archi.providers.base import ProviderType
from src.archi.utils.output_dataclass import PipelineOutput
from src.archi.pipelines.agents.utils.run_memory import RunMemory
from src.archi.pipelines.agents.utils.mcp_utils import AsyncLoopThread
from src.archi.utils.async_loop import AsyncLoopThread
from src.archi.pipelines.agents.tools import initialize_mcp_client
from src.utils.logging import get_logger

Expand Down Expand Up @@ -79,6 +79,10 @@ def create_run_memory(self) -> RunMemory:
"""Instantiate a fresh run memory for an agent run."""
return RunMemory()

def supports_persisted_session_id(self) -> bool:
"""Classic ReAct agents are stateless beyond the provided history."""
return False

def start_run_memory(self) -> RunMemory:
"""Create and store the active memory for the current run."""
memory = self.create_run_memory()
Expand Down
18 changes: 18 additions & 0 deletions src/archi/pipelines/agents/tools/local_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,13 @@ def search(
params=params,
headers=self._headers,
timeout=self.timeout,
allow_redirects=False,
)
if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
raise RuntimeError(
f"Catalog API redirected to {resp.headers.get('Location', '?')} — "
"check DM_API_TOKEN or data_manager auth config"
)
resp.raise_for_status()
data = resp.json()
return data.get("hits", []) or []
Expand All @@ -119,9 +125,15 @@ def get_document(self, resource_hash: str, *, max_chars: int = 4000) -> Optional
params={"max_chars": max_chars},
headers=self._headers,
timeout=self.timeout,
allow_redirects=False,
)
if resp.status_code == 404:
return None
if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
raise RuntimeError(
f"Catalog API redirected to {resp.headers.get('Location', '?')} \u2014 "
"check DM_API_TOKEN or data_manager auth config"
)
resp.raise_for_status()
return resp.json()

Expand All @@ -130,7 +142,13 @@ def schema(self) -> Dict[str, object]:
f"{self.base_url}/api/catalog/schema",
headers=self._headers,
timeout=self.timeout,
allow_redirects=False,
)
if resp.is_redirect or resp.status_code in (301, 302, 303, 307, 308):
raise RuntimeError(
f"Catalog API redirected to {resp.headers.get('Location', '?')} \u2014 "
"check DM_API_TOKEN or data_manager auth config"
)
resp.raise_for_status()
return resp.json()

Expand Down
2 changes: 1 addition & 1 deletion src/archi/pipelines/agents/tools/retriever.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def _format_documents_for_llm(
def create_retriever_tool(
retriever: BaseRetriever,
*,
name: str = "search_knowledge_base",
name: str = "search_vectorstore_hybrid",
description: Optional[str] = None,
max_documents: int = 4,
max_chars: int = 800,
Expand Down
5 changes: 5 additions & 0 deletions src/archi/pipelines/copilot_agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Copilot SDK agent package."""

from .copilot_agent import CopilotAgentPipeline

__all__ = ["CopilotAgentPipeline"]
Loading
Loading