Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the existing LangGraph agent runtime with a new Copilot SDK–based agent pipeline while keeping the legacy pipeline selectable via config, and updates streaming/tooling/agent-editor integrations accordingly.
Changes:
- Added
CopilotAgentPipeline+CopilotEventAdapterto run Copilot SDK sessions and translate streaming events intoPipelineOutput. - Introduced Copilot SDK tool wrappers and a central
TOOL_REGISTRY, plus new async-loop bridging for Flask ↔ async SDK calls. - Expanded unit/UI/smoke coverage and adjusted deployment/runtime configuration (Python 3.11 base image, Copilot CLI installation, Copilot smoke pipeline config).
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_tool_error_handling.py | New unit tests for catalog tool error/redirect/timeout handling. |
| tests/unit/test_ticket_manager.py | New regression tests for ticket ingestion generator error handling. |
| tests/unit/test_pipeline_integration.py | Integration tests for pipeline→adapter→output wiring and override behavior. |
| tests/unit/test_import_sanity.py | Regression test to detect circular imports in key modules. |
| tests/unit/test_copilot_pipeline.py | Unit tests for provider mapping, MCP passthrough, tool allowlisting, and resume logic. |
| tests/unit/test_copilot_event_adapter.py | Unit tests for Copilot streaming event translation into PipelineOutput. |
| tests/unit/test_chat_wrapper_stream.py | Unit tests validating NDJSON event routing from PipelineOutput through ChatWrapper.stream(). |
| tests/unit/test_byok.py | Updates tests to reflect provider registry changes (Gemini removal). |
| tests/unit/test_adapter_error_paths.py | Regression tests for adapter deadlock/timeout/cancellation paths. |
| tests/ui/workflows/22-copilot-streaming.spec.ts | Playwright workflow covering Copilot streaming edge cases (tools/thinking/errors/usage). |
| tests/ui/workflows/21-agent-management.spec.ts | Playwright workflow for agent CRUD and editor flows. |
| tests/ui/fixtures.ts | UI test fixtures updated with agent endpoints + template/tool palette data. |
| tests/test_pipeline_matrix.py | Standalone parity matrix runner comparing LangGraph vs Copilot pipeline behaviors. |
| tests/smoke/tools_smoke.py | Smoke checks updated for new tool factories/registry and direct ops validation. |
| tests/smoke/stream_test.py | New local smoke script to inspect NDJSON streaming event sequences. |
| tests/smoke/react_smoke.py | Enhanced streaming smoke validation for chunks/tools/usage. |
| tests/smoke/deploy_preflight.py | New deployment preflight validations (env, services, vectorstore, code sentinel). |
| tests/smoke/combined_smoke.sh | Runs preflight checks as part of the combined smoke flow. |
| tests/pr_preview_config/pr_preview_copilot_config.yaml | Adds PR preview config targeting CopilotAgentPipeline and local Ollama BYOK. |
| tests/_parse_stream.py | Helper to summarize NDJSON stream logs for debugging. |
| src/utils/config_service.py | Dynamic config update logic adjusted for active-agent clearing behavior. |
| src/interfaces/redmine_mailer_integration/redmine.py | Default redmine pipeline now points to CopilotAgentPipeline. |
| src/interfaces/chat_app/app.py | Streaming integration updates: provider/model overrides via kwargs; metadata-based tool-call extraction; error event handling. |
| src/interfaces/chat_app/api.py | Agent editor tool palette now returns {name, description} objects and excludes SDK built-ins. |
| src/data_manager/collectors/tickets/ticket_manager.py | Wraps generator iteration in try/except to prevent ingestion crashes on mid-iteration errors. |
| src/cli/templates/dockerfiles/base-python-image/requirements.txt | Adds Copilot SDK dependency to base Python image requirements. |
| src/cli/templates/dockerfiles/base-python-image/Dockerfile | Moves base image from Python 3.10 to 3.11. |
| src/cli/templates/dockerfiles/Dockerfile-chat | Installs Copilot CLI binary in the chat image for Copilot SDK runtime. |
| src/cli/templates/dockerfiles/Dockerfile-base | Moves base image from Python 3.10 to 3.11. |
| src/archi/utils/async_loop.py | Adds AsyncLoopThread singleton to run a persistent asyncio loop in a background thread. |
| src/archi/tools/retriever.py | New Copilot SDK tool wrapper for knowledge-base retrieval with doc formatting. |
| src/archi/tools/monit_search.py | New Copilot SDK MONIT OpenSearch tool wrappers. |
| src/archi/tools/file_search.py | New Copilot SDK catalog-backed file/metadata/schema/fetch tool wrappers. |
| src/archi/tools/document_collector.py | Adds shared per-request document collector for attribution/source docs. |
| src/archi/tools/init.py | Introduces central TOOL_REGISTRY (name → factory/description) for the new tool system. |
| src/archi/providers/init.py | Removes Gemini provider registration and name mapping. |
| src/archi/pipelines/copilot_agent.py | Implements CopilotAgentPipeline with BYOK provider mapping, session mgmt, MCP passthrough, and tool allowlisting. |
| src/archi/pipelines/agents/utils/mcp_utils.py | Compatibility shim re-exporting AsyncLoopThread from its new location. |
| src/archi/pipelines/agents/tools/local_files.py | Adds redirect detection and allow_redirects=False for catalog API calls. |
| src/archi/pipelines/init.py | Registers CopilotAgentPipeline in the pipeline package exports. |
| src/archi/copilot_event_adapter.py | Adds Copilot SDK event adapter translating SDK events into PipelineOutput and ensuring cleanup. |
| requirements/requirements-base.txt | Adds Copilot SDK dependency. |
| pyproject.toml | Raises minimum Python to >=3.10, loosens click pinning, and enables pytest asyncio auto mode. |
| examples/agents/cms-comp-ops.md | Updates example agent spec tool list (adds canonical names + new tools). |
| docs/multi-backend-agent-recommendation.md | New design recommendation doc for locking into Copilot SDK vs general abstraction. |
| .github/workflows/pr-preview.yml | CI updates: installs pytest-asyncio and adds (non-fatal) Copilot smoke deployment step. |
| .env_tmp_smoke | Adds a temporary env file used for smoke testing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
.env_tmp_smoke
Outdated
| PG_PASSWORD=RCbGgoCCHrb4TGdJXcbVAdSviWf84+z3eJMWZac9Dcw= | ||
| OLLAMA_HOST=http://localhost:11434 |
There was a problem hiding this comment.
This file appears to contain environment-specific secrets/config (PG_PASSWORD) and should not be committed to the repository. Please remove it from version control and add it to .gitignore (or replace with a documented .env.example containing only placeholders).
| PG_PASSWORD=RCbGgoCCHrb4TGdJXcbVAdSviWf84+z3eJMWZac9Dcw= | |
| OLLAMA_HOST=http://localhost:11434 | |
| PG_PASSWORD=<PG_PASSWORD> | |
| OLLAMA_HOST=<OLLAMA_HOST_URL> |
| # Install GitHub Copilot CLI (used by the Copilot SDK agent runtime) | ||
| ARG COPILOT_CLI_VERSION=latest | ||
| RUN ARCH=$(uname -m) && \ | ||
| if [ "$ARCH" = "aarch64" ]; then CLI_ARCH="arm64"; else CLI_ARCH="x64"; fi && \ | ||
| if [ "$COPILOT_CLI_VERSION" = "latest" ]; then \ | ||
| DOWNLOAD_URL="https://github.com/github/copilot-cli/releases/latest/download/copilot-linux-${CLI_ARCH}.tar.gz"; \ | ||
| else \ | ||
| DOWNLOAD_URL="https://github.com/github/copilot-cli/releases/download/${COPILOT_CLI_VERSION}/copilot-linux-${CLI_ARCH}.tar.gz"; \ | ||
| fi && \ | ||
| wget -q -O /tmp/copilot.tar.gz "$DOWNLOAD_URL" && \ | ||
| tar -xzf /tmp/copilot.tar.gz -C /usr/local/bin && \ | ||
| chmod +x /usr/local/bin/copilot && \ | ||
| rm /tmp/copilot.tar.gz |
There was a problem hiding this comment.
COPILOT_CLI_VERSION defaults to latest, which makes container builds non-reproducible and can break deployments unexpectedly when upstream releases change. Suggestion: pin to a specific Copilot CLI version (and ideally verify a checksum/signature) so builds are deterministic.
src/archi/tools/monit_search.py
Outdated
| import requests as req_lib | ||
|
|
There was a problem hiding this comment.
build_monit_search_tool imports requests as req_lib but never uses it. This is dead code and can confuse readers/linting; remove the unused import (or use it if it’s meant to catch requests-specific exceptions).
| import requests as req_lib |
src/archi/tools/file_search.py
Outdated
| @define_tool(name=name, description=tool_description) | ||
| async def _fetch_document(params: DocumentFetchInput) -> str: | ||
| resource_hash = params.resource_hash | ||
| max_chars = params.max_chars |
There was a problem hiding this comment.
default_max_chars is accepted by build_document_fetch_tool(...) but never used (the tool always uses params.max_chars). Either remove the unused parameter or wire it into the DocumentFetchInput default / fallback logic to avoid misleading callers.
| max_chars = params.max_chars | |
| max_chars = params.max_chars or default_max_chars |
src/utils/config_service.py
Outdated
| if active_agent_name is not None: | ||
| updates.append("active_agent_name = %s") | ||
| params.append(active_agent_name) | ||
| params.append(active_agent_name if active_agent_name else None) |
There was a problem hiding this comment.
active_agent_name is being converted to None when an empty string is provided (active_agent_name if active_agent_name else None). That stores NULL in the DB, which contradicts the new call sites that pass "" to clear the active agent and the PR description (“empty string instead of NULL”). Suggestion: persist the empty string as-is (only treat None as “don’t update”), and if you need a dedicated “clear” behavior, handle that explicitly rather than collapsing "" → NULL here.
| params.append(active_agent_name if active_agent_name else None) | |
| params.append(active_agent_name) |
src/interfaces/chat_app/app.py
Outdated
| ts = datetime.now() | ||
| else: | ||
| ts = datetime.now() |
There was a problem hiding this comment.
In the metadata-based tool-call path, fallback timestamps use datetime.now() (naive) when created_at is missing or unparsable. Elsewhere in this function you use timezone-aware datetime.now(timezone.utc), so this introduces mixed naive/aware datetimes that can break DB inserts/comparisons. Use a timezone-aware default (e.g., datetime.now(timezone.utc)) consistently here too.
| ts = datetime.now() | |
| else: | |
| ts = datetime.now() | |
| ts = datetime.now(timezone.utc) | |
| else: | |
| ts = datetime.now(timezone.utc) |
| monit_index = "monit_prod_cms_rucio_raw_events*" | ||
| if _want("monit_opensearch_search"): | ||
| tools.append( | ||
| build_monit_search_tool( | ||
| self._monit_client, | ||
| tool_name="rucio_events_search", | ||
| index=monit_index, | ||
| skill=self._rucio_events_skill, | ||
| ) | ||
| ) | ||
| if _want("monit_opensearch_aggregation"): | ||
| tools.append( | ||
| build_monit_aggregation_tool( | ||
| self._monit_client, | ||
| tool_name="rucio_events_aggregation", | ||
| index=monit_index, | ||
| skill=self._rucio_events_skill, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The MONIT tools are selected by canonical names (monit_opensearch_search / monit_opensearch_aggregation), but the actual tools you build are renamed to rucio_events_search / rucio_events_aggregation. This makes tool selection/allowlisting inconsistent with TOOL_REGISTRY and the agent spec editor (which advertises the canonical names), and can lead to sessions that don’t expose the expected tool name. Suggestion: keep the tool names aligned with TOOL_REGISTRY (use the defaults), or update the registry/editor + selection logic so the same canonical name is used end-to-end.
lucalavezzo
left a comment
There was a problem hiding this comment.
Thanks a lot @JasonMoho, I have some small comments, mostly organizational + I'm not so sure about all these unit tests. I will also set up a test and let you know if I notice any other issues.
.env_tmp_smoke
Outdated
There was a problem hiding this comment.
can we just drop this? don't need backward comp
src/archi/pipelines/copilot_agent.py
Outdated
| "actually executed and returned that result. " | ||
| "If a user asks for shell or filesystem access that is not available via " | ||
| "the provided tools, say that capability is unavailable in this deployment." | ||
| ) |
There was a problem hiding this comment.
how robust is this? is it able to access local files in the container?
| _CopilotClient = None # type: ignore[assignment] | ||
|
|
||
|
|
||
| def _get_copilot_client_cls(): |
There was a problem hiding this comment.
I think this can be dropped
src/archi/pipelines/copilot_agent.py
Outdated
| } | ||
|
|
||
|
|
||
| def _canonicalize_tool_name(name: str) -> str: |
There was a problem hiding this comment.
i think this can be dropped/we should just pick one name and use it consistently?
There was a problem hiding this comment.
maybe this shuld be under a new dir here like copilot_agents?
There was a problem hiding this comment.
for all these tools, they are identicaly to the langchain ones + a decorator. any argument for keeping them separate? having one consistent tool across agents (with two decorated functions) could be easier to maintain
There was a problem hiding this comment.
I think this should also go in a more nexted directory along w/ the other copilot SDK stuff
There was a problem hiding this comment.
significant overlap with https://github.com/archi-physics/archi/blob/main/scripts/dev/run_smoke_preview.sh
this version is certainly cleaner, happy to drop old one
There was a problem hiding this comment.
do we need this? (and many of the other unit tests..)
There was a problem hiding this comment.
honestly I didn't look at all the unit tests very closely they looked a bit like ai slop
|
Pushed a small fix to the session ID managements. |
- fix deadlock: _run_session() now wraps body in try/except/finally so signal_done() is always called and errors are pushed to the queue - fix circular import: make TOOL_REGISTRY, DocumentCollector, and build_retriever_tool imports lazy in copilot_agent.py - fix queue hang: iter_outputs() uses poll_timeout (180s) as safety net - fix error propagation: add 'error' event handler in app.py so agent pipeline errors are logged and sent to the client - fix hook signatures: handlers accept (input_data, context) per SDK calling convention; use dict key access instead of getattr - fix tool name alias: map search_vectorstore_hybrid to search_knowledge_base - fix get_tool_registry: change from classmethod to instance method - fix redmine default: use correct default value - update tests: 121 passing (35 copilot-specific) - clean up temp SDK introspection scripts
…etection - Move for-loop inside try/except in _collect_from_client() so generator exceptions (e.g. Redmine AuthError) are caught instead of crashing the entire ingestion pipeline - Add allow_redirects=False and redirect detection to RemoteCatalogClient search/get_document/schema methods so auth failures produce clear errors instead of silently parsing the login HTML page as JSON
…e cases - Remove available_tools=[] that disabled all custom tools (root cause) - Clear session_id after resume failure to prevent reusing bad ID - Log warning for orphan tool.execution_complete events - Switch tool lifecycle from hooks to streaming events (native toolCallId) - Accumulate usage across multiple API calls instead of overwriting - Add anti-fabrication prompt and builtin tool blocklist - Permission gate allows only declared Archi custom tools - Support provider/model/api_key overrides via stream kwargs - Map empty active_agent_name to NULL in config service
- 208 unit tests: event adapter (16), pipeline (36), plus existing - 28 playwright tests: agent management CRUD (15), streaming edge cases (13) - Smoke test updates for copilot pipeline tool verification - Agent mock data and /api/agents/list mock in UI fixtures
- ticket_manager: wrap iteration loop in try/except for generator exceptions - app.py: fix output_tool_call_id fallback for copilot adapter path - test_ticket_manager: add overwrite param to match updated signature - test_chat_wrapper_stream: add current_model_used/current_pipeline_used to mock
The metadata-only path (Copilot adapter) for tool_start events was not calling _remember_tool_call() or updating emitted_tool_start_ids. This caused the tool_output handler to emit a duplicate tool_start with fallback 'unknown' name and empty args, overwriting the correct values.
…istory and session disconnect leak
- Replace excluded_tools blocklist with available_tools allowlist to block ALL
SDK built-in tools (sql, report_intent, bash, etc.) without enumeration
- Fix multi-turn conversation history loss by prepending prior turns for new
(non-resumed) sessions; _create_session returns (session, was_resumed) tuple
- Auto-append /v1 to base_url for local providers (Ollama compatibility)
- Move session disconnect cleanup to finally block to prevent connection leaks
- Return tool objects {name, description} from agent editor API endpoint
- Update tool restriction tests for allowlist approach
- test_tool_error_handling: catalog tool HTTP error handling - test_adapter_error_paths: event adapter edge cases - test_import_sanity: import smoke test - test_pipeline_integration: pipeline integration tests - test_pipeline_matrix: 38-test feature parity matrix - deploy_preflight: pre-deploy validation smoke test - stream_test: streaming smoke test
…on leak, add tests - Replace excluded_tools blocklist with available_tools allowlist to block ALL SDK built-in tools (sql, report_intent, bash, etc.) without enumeration - Fix multi-turn conversation history loss in Copilot pipeline by prepending prior turns for non-resumed sessions - Auto-append /v1 to local provider base_url for Ollama compatibility - Fix session disconnect leak: move cleanup to finally block - Return tool objects with descriptions from agent editor API - Format branch-changed files with black/isort - Add test files: pipeline matrix, adapter error paths, tool error handling, import sanity, pipeline integration, smoke tests
Remove 22 files that were reformatted by black/isort but had no functional changes, to reduce PR diff noise. Also remove test_ticket_manager.py which tests data ingestion and is unrelated to the Copilot SDK replacement.
Revert 14 files to origin/main that had only black/isort formatting. For 4 files with mixed changes, revert formatting and re-apply only the functional fixes: - test_byok: remove ProviderType.GEMINI assertion - redmine: default pipeline to CopilotAgentPipeline - config_service: map empty active_agent_name to NULL - ticket_manager: wrap iteration in try/except Re-add test_ticket_manager.py regression test.
…py, providers/__init__.py, local_files.py, and remove gemini test
Run a second smoke deployment after Playwright tests using CopilotAgentPipeline with BYOK against local Ollama (qwen3:4b). Reuses the same agent spec and ports; the CMSCompOps deployment is cleaned up first to free resources.
- remove .env_tmp_smoke from repo, add to .gitignore - pin copilot CLI version to v1.0.12 in Dockerfile-chat - remove unused requests import in monit_search.py - wire default_max_chars fallback in file_search.py - fix active_agent_name empty string collapse in config_service.py - fix naive datetime.now() to datetime.now(timezone.utc) in app.py - drop mcp_utils.py backward compat shim, update base_react.py import - simplify CopilotClient import (remove lazy caching) - remove custom MONIT tool_name overrides (use canonical registry names)
- move copilot_agent.py -> pipelines/copilot_agents/ - move copilot_event_adapter.py -> pipelines/copilot_agents/ - move tools/ -> pipelines/copilot_agents/tools/ - remove _NO_FAKE_TOOL_USE_INSTRUCTION (allowlist handles this) - update all imports across src and tests
- Remove _canonicalize_tool_name and rename search_vectorstore_hybrid to search_knowledge_base across all agent specs, docs, and code - SDK retriever now imports _normalize_results and _format_documents_for_llm from the LangGraph module (same pattern as file_search and monit_search) - Remove 6 trivial tests from test_byok.py (TestProviderKeyIntegration, TestProviderDisplayNames, TestModelInfo) - Remove TestToolNameAliases from test_copilot_pipeline.py
…d-safe params and removed _NO_FAKE_TOOL_USE_INSTRUCTION
c5b91a8 to
0d03288
Compare
Remove extra context arg at call site (7 args passed, method takes 6) and stale context.provider_used/model_used references in method body.
These were present on main but dropped during rebase. The non-streaming response path and insert_conversation rely on self.current_model_used and self.current_pipeline_used.
Replace LangGraph Agent with GitHub Copilot SDK
Summary
This PR replaces the LangGraph-based agent runtime with the GitHub Copilot SDK, providing a more maintainable and extensible agent pipeline while preserving full backward compatibility with the existing LangGraph pipeline via config-driven selection (
agent_class).47 files changed | ~6,850 insertions | ~190 deletions
Core Changes
Copilot SDK Agent Pipeline (
src/archi/pipelines/copilot_agent.py)CopilotAgentPipelineimplementing the full agent lifecycle using the Copilot Python SDK/v1auto-append — automatically appends/v1to base URLs for local providers (Ollama) that require itCopilot Event Adapter (
src/archi/copilot_event_adapter.py)TextMessageStart,ToolCallStart,ConfirmationRequest, etc.) into Archi'sPipelineOutputdataclassfinallyblock — fixes a leak where cleanup only ran onGeneratorExitCopilot Tool Wrappers (
src/archi/tools/)@define_toolwrappers for all 7 Archi tools:search_knowledge_base,search_local_files,search_metadata_index,list_metadata_schema,fetch_catalog_document,monit_opensearch_search,monit_opensearch_aggregationTOOL_REGISTRYin__init__.pyprovides a central registry of available toolsdocument_collector.py— shared helper for collecting documents from tool resultsAsync Bridge (
src/archi/utils/async_loop.py)AsyncLoopThread— runs a persistent asyncio event loop in a background thread, bridging Flask's sync world with the SDK's async APISecurity
Tool Allowlist (Critical Fix)
sql,report_intent,bash, etc.). A blocklist approach only blocked 6 known tools, allowing unknown tools to execute — confirmed in live testing when the model created a SQL database table in response to "remember my name is Jason."excluded_toolsblocklist withavailable_toolsallowlist — only the 7 custom Archi tools are permitted. Any new SDK built-in tools are automatically blocked._COPILOT_BUILTIN_TOOL_BLOCKLISTconstant entirelyIntegration Changes
Chat App Streaming (
src/interfaces/chat_app/app.py)stream_kwargspattern replaces direct LLM override for provider/model selectionAgent Editor API (
src/interfaces/chat_app/api.py){name, description}objects for tools (was bare strings)Pipeline Registry (
src/archi/pipelines/__init__.py)CopilotAgentPipelineregistered as selectable pipeline classMinor Fixes
src/archi/providers/__init__.py)CopilotAgentPipelinein redmine integrationconfig_service.py: empty string instead of NULL for clearedactive_agent_nameticket_manager.py: try/except around collector iteration to prevent crasheslocal_files.py: redirect detection for catalog API auth failuresmcp_utils.py: refactored MCP server configuration utilitiesBackward Compatibility
The LangGraph pipeline (
BaseReActAgent,CMSCompOpsAgent) remains fully functional. Pipeline selection is config-driven:Both pipelines pass the same 38-test feature parity matrix.
Test Coverage
New Test Files
tests/unit/test_copilot_pipeline.py— 36 tests for pipeline helpers, tool restrictions, session managementtests/unit/test_copilot_event_adapter.py— adapter event translation teststests/unit/test_adapter_error_paths.py— error path coveragetests/unit/test_chat_wrapper_stream.py— streaming integration teststests/unit/test_pipeline_integration.py— pipeline integration teststests/unit/test_tool_error_handling.py— tool error handlingtests/unit/test_import_sanity.py— import validationtests/unit/test_ticket_manager.py— ticket manager error handlingtests/test_pipeline_matrix.py— 38-test feature parity matrix (runs both pipelines)tests/ui/workflows/21-agent-management.spec.ts— agent editor Playwright teststests/ui/workflows/22-copilot-streaming.spec.ts— Copilot streaming Playwright teststests/smoke/stream_test.py— streaming smoke testtests/smoke/deploy_preflight.py— deployment preflight checksDeployment Notes
copilot-sdkPython package (added torequirements-base.txt)BaseReActAgentrequire no config changesagent_class: CopilotAgentPipelinein config and provide a GitHub token or compatible BYOK provider credentials