refactor(backend): remove local RAG mode and use knowledge_runtime exclusively by sunnights · Pull Request #1013 · wecode-ai/Wegent

sunnights · 2026-04-21T16:21:53Z

Summary by CodeRabbit

New Features
- Added a standalone Knowledge Runtime service with HTTP APIs for indexing, querying, admin, health, content fetching, auth, structured logging, and a startup script.
Chores
- Backend now routes RAG operations to the remote Knowledge Runtime (remote-only execution); local RAG runtime components and related dependencies removed. Docker Compose and build configs updated to include the Knowledge Runtime service, ports, volumes, env vars, and healthchecks.
Documentation
- Added architecture, deployment plans, and environment templates for the Knowledge Runtime.

- 添加认证中间件 verify_internal_token，支持常量时间比较 - 为 RAG 操作端点添加认证依赖（health 除外） - Backend RemoteRagGateway 添加 Authorization header 支持 - 添加 KNOWLEDGE_RUNTIME_TOKEN 配置项 - Docker Compose 添加环境变量配置（默认注释） - 添加 knowledge_runtime 和 Backend 认证测试

- 将 KNOWLEDGE_RUNTIME_TOKEN 重命名为 INTERNAL_SERVICE_TOKEN - Backend/knowledge_runtime/chat_shell 统一使用同一 token 进行服务间认证 - 更新 docker-compose.yml 和 .env.example 中的配置示例 - 更新相关测试用例

…clusively - Simplify gateway_factory.py to always return RemoteRagGateway singleton - Remove RAG_RUNTIME_MODE config and related validation logic - Delete LocalRagGateway and local_data_plane module - Remove fallback logic from API endpoints and orchestrator - Update all tests to use RemoteRagGateway mocks - Update README.md to reflect new architecture All RAG operations now route through knowledge_runtime service (:8200). Co-Authored-By: Claude Opus 4.6 <[email protected]>

Remove Backend's direct dependency on knowledge_engine module. All RAG operations now go through knowledge_runtime HTTP service via RemoteRagGateway. Changes: - Add storage-types API endpoint to knowledge_runtime for exposing storage type information (elasticsearch, qdrant, milvus) - Add get_storage_types() method to RemoteRagGateway - Modify retrievers.py to use RemoteRagGateway for storage type info - Refactor retrieval_service.py to use RemoteRagGateway.query() - Delete embedding module (helper function moved to runtime_resolver.py) - Remove knowledge_engine from pyproject.toml dependencies - Update tests to mock RemoteRagGateway instead of knowledge_engine Architecture: Backend -> RemoteRagGateway -> knowledge_runtime -> knowledge_engine Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Remove llama-index-* dependencies from backend and chat_shell - Remove pymilvus, docx2txt, and related packages - These dependencies are now only needed in knowledge_engine - Reduces installation size and improves architecture clarity

coderabbitai · 2026-04-21T16:22:50Z

📝 Walkthrough

Walkthrough

Moves RAG execution out of the backend into a new FastAPI "knowledge_runtime" data-plane service, removes local data-plane code and fallback logic, makes the backend use a singleton RemoteRagGateway for all RAG operations, updates Docker/compose, tests, configs, and protocol models for remote-only execution. (39 words)

Changes

Cohort / File(s)	Summary
New knowledge_runtime service `knowledge_runtime/.env.example`, `knowledge_runtime/pyproject.toml`, `knowledge_runtime/start.sh`, `knowledge_runtime/knowledge_runtime/__init__.py`, `knowledge_runtime/knowledge_runtime/config.py`, `knowledge_runtime/knowledge_runtime/main.py`, `knowledge_runtime/knowledge_runtime/api/...`, `knowledge_runtime/knowledge_runtime/core/...`, `knowledge_runtime/knowledge_runtime/middleware/...`, `knowledge_runtime/knowledge_runtime/services/...`, `knowledge_runtime/tests/*`	Adds a FastAPI data-plane service exposing `/internal/rag/*` (health, index, query, admin), config, auth dependency, logging, content fetcher, executors (index/query/admin), packaging, startup script, and tests.
Gateway & backend routing changes `backend/app/services/rag/gateway_factory.py`, `backend/app/services/rag/remote_gateway.py`, `backend/app/api/endpoints/adapter/retrievers.py`, `backend/app/api/endpoints/internal/rag.py`, `backend/app/api/endpoints/rag.py`	Removes mode-based gateway selection and fallback; gateway factory now returns a singleton RemoteRagGateway; RemoteRagGateway auth/HTTP behavior refactored and adds `get_storage_types()`; backend endpoints call remote gateway directly and propagate remote errors.
Local data-plane & embedding factory removed `backend/app/services/rag/local_data_plane/*`, `backend/app/services/rag/local_gateway.py`, `backend/app/services/rag/embedding/factory.py`, `backend/app/services/rag/embedding/__init__.py`	Deletes local indexing/retrieval/admin implementations, LocalRagGateway, and embedding factory exports.
Backend logic updates `backend/app/services/knowledge/orchestrator.py`, `backend/app/services/rag/retrieval_service.py`, `backend/app/services/rag/runtime_resolver.py`	Removes remote→local fallback; orchestrator and retrieval now build runtime specs and call gateway methods; moves custom-header placeholder processing into runtime_resolver.
Config, deps, and env examples `backend/app/core/config.py`, `backend/.env.example`, `.env.example`, `backend/pyproject.toml`, `chat_shell/pyproject.toml`, `knowledge_runtime/.env.example`, `backend/.env.example`	Removes RAG runtime-mode settings and related tests; updates `INTERNAL_SERVICE_TOKEN` docs; removes RAG-related Python deps and workspace mapping; adds knowledge_runtime and logging env example entries and an `INTERNAL_SERVICE_TOKEN` entry in backend/.env.example.
Tests: removals and additions `backend/tests/...`, `backend/tests/services/rag/` (removed), `knowledge_runtime/tests/`	Refactors backend tests to mock RemoteRagGateway; removes many local-data-plane tests; adds comprehensive knowledge_runtime unit tests (auth, content fetch, executors, logging).
Shared protocol models `shared/models/__init__.py`, `shared/models/knowledge_runtime_protocol.py`	Adds `StorageTypeInfo` and `StorageTypesResponse` models and re-exports `ContentRef` for the knowledge_runtime protocol.
Docker & orchestration `docker-compose.yml`, `docker-compose.build.yml`, `docker/knowledge_runtime/Dockerfile`, `docker/backend/Dockerfile`, `docker/standalone/Dockerfile`, `start.sh`	Adds knowledge_runtime service to compose/build, updates backend envs/volumes for logging and `KNOWLEDGE_RUNTIME_URL`, provides Dockerfile for knowledge_runtime, removes copying `knowledge_engine` from backend images, and extends start scripts to manage knowledge_runtime.
Documentation `AGENTS.md`, `docs/plans/2026-04-19-knowledge-runtime-architecture.md`, `backend/app/services/rag/README.md`	Documents the Knowledge Runtime service, `/internal/rag/*` endpoints, architecture and request flows; updates README/AGENTS to reflect remote-only execution.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/Frontend
    participant Backend as Backend (Control Plane)
    participant RemoteGW as RemoteRagGateway
    participant Runtime as knowledge_runtime (Data Plane)
    participant KE as knowledge_engine
    participant Storage as VectorStorage

    Client->>Backend: POST /api/rag/retrieve
    Backend->>RemoteGW: query(QueryRuntimeSpec)
    RemoteGW->>Runtime: POST /internal/rag/query (RemoteQueryRequest)
    Runtime->>Runtime: verify token, fetch content
    Runtime->>KE: QueryExecutor.execute()
    KE->>Storage: vector store ops
    Storage-->>KE: results
    KE-->>Runtime: RemoteQueryResponse
    Runtime-->>RemoteGW: JSON response
    RemoteGW-->>Backend: QueryResult
    Backend-->>Client: API response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I nibble code and drum my paws,

A runtime hops where heavy loads pause.
HTTP paths carry chunks so light,
Backends smile and tests take flight.
Hooray — small hops, big leaps tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the primary objective: removing local RAG execution and making knowledge_runtime the exclusive RAG execution path. The title is clear, specific, and directly reflects the main architectural change.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

AGENTS.md (1)
562-613: ⚠️ Potential issue | 🟡 Minor

Update the quick-reference ports list for Knowledge Runtime.

Line 562 introduces default port 8200, but the quick reference at Line 613 still omits it.
📝 Proposed documentation update
-**Ports:** 3000 (frontend), 8000 (backend), 8001 (chat shell), 3306 (MySQL), 6379 (Redis)
+**Ports:** 3000 (frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge runtime), 3306 (MySQL), 6379 (Redis)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 562 - 613, Update the Quick Reference ports list in
AGENTS.md to include the Knowledge Runtime default port 8200; find the "Quick
Reference" section (the code block showing start services/tests/formatting) and
add "8200 (knowledge runtime)" to the Ports line so it reads something like
"Ports: 3000 (frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge
runtime), 3306 (MySQL), 6379 (Redis)".
backend/app/services/knowledge/orchestrator.py (1)
2130-2134: ⚠️ Potential issue | 🟡 Minor

Update the stale fallback wording.

retrieve_knowledge() no longer supports local gateway execution or automatic fallback, but the docstring still says it does. Please update this to describe the remote-only knowledge_runtime path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/knowledge/orchestrator.py` around lines 2130 - 2134, The
docstring for retrieve_knowledge() is stale: it claims local gateway execution
and automatic fallback but the function now routes only via the remote-only
knowledge_runtime path. Update the docstring in orchestrator.py to remove
references to local gateway execution and automatic fallback, and instead state
that retrieve_knowledge() is the unified entry point for MCP tools and the Open
API that invokes the remote-only knowledge_runtime RAG gateway path (include
mention of the knowledge_runtime flow and any relevant parameters it uses).
backend/app/api/endpoints/internal/rag.py (1)
235-251: ⚠️ Potential issue | 🔴 Critical

Route direct_injection mode through list_chunks() instead of query().

_execute_query() unconditionally calls rag_gateway.query() for all route modes. However, when _finalize_query_runtime_spec() resolves route_mode to "direct_injection", the request should route through rag_gateway.list_chunks() (as documented in backend/app/services/rag/README.md lines 73–74). Currently, direct_injection specs bypass the remote gateway's chunk retrieval path and instead hit the query endpoint without knowledge_base_configs populated, violating the intended routing architecture.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag.py` around lines 235 - 251, The
function _execute_query currently always calls rag_gateway.query(), but when
runtime_spec.route_mode == "direct_injection" it must route to
rag_gateway.list_chunks() instead; update _execute_query to first ensure
knowledge_base_configs is populated the same way as the existing branch (use
runtime_resolver.build_query_knowledge_base_configs with db, knowledge_base_ids,
user_id, user_name on runtime_spec) and then, if getattr(runtime_spec,
"route_mode", None) == "direct_injection", call and return await
rag_gateway.list_chunks(runtime_spec, db=db), otherwise call and return await
rag_gateway.query(runtime_spec, db=db); keep using get_query_gateway() and the
same runtime_spec/model_copy flow.
backend/tests/integration/test_rag_remote_mode.py (1)
21-21: ⚠️ Potential issue | 🟡 Minor

Unused import after fallback test removal.

RemoteRagGatewayError is no longer referenced anywhere in this file (the prior test_internal_retrieve_falls_back_to_local_when_remote_query_fails_integration was the only user). Remove the import.
🧹 Proposed cleanup
-from app.services.rag.remote_gateway import RemoteRagGatewayError
 from app.services.rag.runtime_specs import DirectInjectionBudget, QueryRuntimeSpec
Based on learnings: "Delete dead code aggressively: ensure that deprecated, unused, or obsolete code is removed regardless of effort required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/integration/test_rag_remote_mode.py` at line 21, Remove the
now-unused import RemoteRagGatewayError from the top of the test module (import
statement "from app.services.rag.remote_gateway import RemoteRagGatewayError")
since the fallback test that referenced it was removed; simply delete that
import line to eliminate the dead/unused symbol and ensure no other references
to RemoteRagGatewayError remain in test_rag_remote_mode.py.
backend/app/api/endpoints/adapter/retrievers.py (1)
38-108: ⚠️ Potential issue | 🟡 Minor

Missing error handling for RemoteRagGatewayError on storage-types endpoints.

Both get_storage_retrieval_methods (lines 38–71) and get_storage_type_retrieval_methods (lines 74–108) now perform a remote HTTP call but let RemoteRagGatewayError propagate as an unhandled 500. Compare with test_retriever_connection (line 295), which catches it and returns a structured error. If knowledge_runtime is down or misconfigured, the frontend will get a confusing 500 instead of a meaningful failure.

Also, the case-insensitive comparison on line 98 only lowercases the user input (storage_type.lower()) but assumes type_info.type is already lowercase. If the remote service ever returns mixed-case names, the match silently fails. Consider .lower() on both sides.
🛠️ Suggested fix
 async def get_storage_retrieval_methods():
     ...
     _check_rag_available()
-    gateway = RemoteRagGateway()
-    response = await gateway.get_storage_types()
+    try:
+        gateway = RemoteRagGateway()
+        response = await gateway.get_storage_types()
+    except RemoteRagGatewayError as e:
+        logger.error(f"Failed to fetch storage types: {e}")
+        raise HTTPException(status_code=502, detail=f"Knowledge Runtime unavailable: {e}")
     ...
-        if type_info.type == storage_type.lower():
+        if type_info.type.lower() == storage_type.lower():
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 108, Both
endpoints get_storage_retrieval_methods and get_storage_type_retrieval_methods
call gateway.get_storage_types without handling RemoteRagGatewayError and do a
one-sided lowercase comparison; wrap the await gateway.get_storage_types() call
in a try/except that catches RemoteRagGatewayError (the same pattern used in
test_retriever_connection), and on error raise an HTTPException with a clear
status_code (e.g., 502 or 424) and the gateway error message; also make the
storage type comparison robust by lowercasing both sides (compare
type_info.type.lower() to storage_type.lower()) so mixed-case names from the
remote service still match.

🧹 Nitpick comments (11)

knowledge_runtime/knowledge_runtime/core/logging.py (1)

33-43: Hoist the get_request_id import out of the filter hot path.

RequestIdFilter.filter runs on every log record, so importing shared.telemetry.context.span on each call — even with sys.modules caching — adds unnecessary work and keeps dead ImportError handling for what is effectively a hard runtime dependency. Importing once at module scope makes the hot path a single function call and surfaces missing-dependency errors at startup rather than per-record.

♻️ Proposed refactor

 import logging
 import os
 import sys
 from logging.handlers import TimedRotatingFileHandler
+
+try:
+    from shared.telemetry.context.span import get_request_id
+except ImportError:  # pragma: no cover - shared is a required dep
+    def get_request_id():  # type: ignore[misc]
+        return None
@@
     def filter(self, record: logging.LogRecord) -> bool:
         ...
-        try:
-            from shared.telemetry.context.span import get_request_id
-
-            request_id = get_request_id()
-            record.request_id = request_id if request_id else "-"
-        except ImportError:
-            # If telemetry module is not available, use placeholder
-            record.request_id = "-"
-        except Exception:
-            # Fallback for any other errors
-            record.request_id = "-"
-
+        try:
+            request_id = get_request_id()
+        except Exception:  # filters must never raise
+            request_id = None
+        record.request_id = request_id or "-"
         return True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/core/logging.py` around lines 33 - 43,
Hoist the import of get_request_id out of RequestIdFilter.filter by importing it
once at module scope: attempt "from shared.telemetry.context.span import
get_request_id" at top of the module and handle ImportError there (e.g., set a
local stub get_request_id = lambda: None or leave it None) so missing-dependency
errors surface at startup; then simplify RequestIdFilter.filter to call
get_request_id() (or check if callable) and set record.request_id =
get_request_id() or "-" accordingly, removing the per-record try/except
ImportError block.

backend/app/services/rag/runtime_resolver.py (1)

35-60: Type hint drifts from the early-return behavior.

_process_custom_headers_placeholders is annotated to return dict[str, Any], but the early return at line 51 passes through custom_headers unchanged when it's falsy or non-dict — possibly returning None/list/etc. The sole current caller (line 687) already checks isinstance(custom_headers, dict) before invoking, so this branch is unreachable in practice, but a future reuse could silently break the contract.

Tightening either the caller's contract or the helper's fallback is a cheap safety win.
♻️ Suggested tightening
 def _process_custom_headers_placeholders(
-    custom_headers: dict[str, Any], user_name: str | None = None
+    custom_headers: Any, user_name: str | None = None
 ) -> dict[str, Any]:
     ...
-    if not custom_headers or not isinstance(custom_headers, dict):
-        return custom_headers
+    if not custom_headers or not isinstance(custom_headers, dict):
+        return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 35 - 60, The
function _process_custom_headers_placeholders currently declares return type
dict[str, Any] but returns the raw custom_headers on early-exit, which can
violate the contract; fix by normalizing inputs: if not custom_headers
(None/empty/False) return an empty dict{}; if custom_headers is present but not
a dict, raise a TypeError to fail fast; then proceed to build data_sources and
call build_default_headers_with_placeholders as before. This keeps the return
type stable and surfaces invalid inputs immediately.

knowledge_runtime/knowledge_runtime/services/admin_executor.py (1)

13-13: Unused import: DocumentService.

DocumentService is imported but never referenced in this module. Please remove it to avoid drifting dependencies and accidental coupling to knowledge_engine.services.document_service.

-from knowledge_engine.services.document_service import DocumentService
-from knowledge_engine.storage.factory import (
+from knowledge_engine.storage.factory import (
     create_storage_backend_from_runtime_config,
     get_all_storage_retrieval_methods,
     get_supported_storage_types,
 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/admin_executor.py` at line 13,
Remove the unused import DocumentService from the import list in
admin_executor.py: delete the line "from
knowledge_engine.services.document_service import DocumentService" and ensure no
other references to DocumentService remain in the module (or replace any
accidental usage with the correct service). After removal, run linters/tests to
confirm no import-related errors.

knowledge_runtime/tests/test_logging.py (1)

66-206: Add a fixture to save/restore root logger state and close file handlers.

Every test mutates the global root logger (and uvicorn.access) via handlers.clear() + setup_logging(...), but none of them close the handlers or restore prior state when finished. This has two consequences:

File handlers opened by setup_logging(log_file_enabled=True, log_dir=tmpdir, ...) remain attached to the root logger after the test returns. On Windows (and on any platform where the test order causes a later write) the open file descriptor will block TemporaryDirectory cleanup, and log records emitted by subsequent tests can still be written into files under an already-removed tmpdir.

The last test's handlers leak into the rest of the pytest session, polluting unrelated tests in the package.

Consider a autouse fixture that snapshots/clears handlers before each test and closes/restores them after:
♻️ Proposed fixture
+@pytest.fixture(autouse=True)
+def _reset_logging():
+    loggers = [logging.getLogger(), logging.getLogger("uvicorn.access")]
+    saved = [(lg, list(lg.handlers), lg.level, lg.propagate) for lg in loggers]
+    for lg in loggers:
+        lg.handlers.clear()
+    try:
+        yield
+    finally:
+        for lg, handlers, level, propagate in saved:
+            for h in lg.handlers:
+                try:
+                    h.close()
+                except Exception:
+                    pass
+            lg.handlers = handlers
+            lg.setLevel(level)
+            lg.propagate = propagate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/tests/test_logging.py` around lines 66 - 206, Tests mutate
the global root logger and leak file handlers; add an autouse pytest fixture in
the test module that snapshots the root logger and any relevant child loggers
(e.g., "uvicorn.access", "uvicorn", "fastapi", "httpx", "httpcore") before each
test, clears handlers and sets desired state, and after the test closes any
file/stream handlers, restores the original handlers, levels and propagate flags
so setup_logging (and tests like test_setup_logging_file_mode,
test_setup_logging_access_log_separate_file,
test_setup_logging_access_log_console_mode) cannot leak open file descriptors or
mutated logger state across tests.

backend/app/services/rag/remote_gateway.py (1)

73-119: Duplicated header construction between _post_model and _get.

The Authorization header building is repeated verbatim. Extract a small helper to keep a single source of truth.

♻️ Proposed refactor

+    def _build_headers(self) -> dict[str, str]:
+        headers: dict[str, str] = {}
+        if self._auth_token:
+            headers["Authorization"] = f"Bearer {self._auth_token}"
+        return headers
+
     async def _post_model(self, path: str, payload: Any) -> dict[str, Any]:
-        headers = {}
-        if self._auth_token:
-            headers["Authorization"] = f"Bearer {self._auth_token}"
-
         try:
             async with httpx.AsyncClient(timeout=self._timeout) as client:
                 response = await client.post(
                     f"{self._base_url}{path}",
                     json=payload.model_dump(mode="json", exclude_none=True),
-                    headers=headers,
+                    headers=self._build_headers(),
                 )
         ...
     async def _get(self, path: str) -> dict[str, Any]:
         """Make a GET request to knowledge_runtime."""
-        headers = {}
-        if self._auth_token:
-            headers["Authorization"] = f"Bearer {self._auth_token}"
-
         try:
             async with httpx.AsyncClient(timeout=self._timeout) as client:
                 response = await client.get(
                     f"{self._base_url}{path}",
-                    headers=headers,
+                    headers=self._build_headers(),
                 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 73 - 119, Both
_post_model and _get duplicate the Authorization header construction; create a
small helper method (e.g., _build_headers or _auth_headers) that returns the
headers dict using self._auth_token and reuse it from _post_model and _get
(replace the inline header logic with a call to that helper); ensure the helper
lives on the same class as _post_model/_get and is used everywhere this pattern
appears so header construction is a single source of truth.

backend/app/services/rag/retrieval_service.py (1)

729-740: Redundant local import and duplicate RagRuntimeResolver instantiation.

RagRuntimeResolver is already imported at module level (line 19) and self.runtime_resolver is initialized in __init__ (line 41). Creating a new instance here is wasteful and inconsistent with the rest of the class. Similarly, get_list_chunks_gateway should be hoisted to the module-level imports next to get_query_gateway (line 18).

♻️ Proposed refactor

-from app.services.rag.gateway_factory import get_query_gateway
+from app.services.rag.gateway_factory import (
+    get_list_chunks_gateway,
+    get_query_gateway,
+)

-        from app.services.rag.gateway_factory import get_list_chunks_gateway
-        from app.services.rag.runtime_resolver import RagRuntimeResolver
-
         # Build runtime spec via resolver
-        runtime_resolver = RagRuntimeResolver()
-        spec = runtime_resolver.build_internal_list_chunks_runtime_spec(
+        spec = self.runtime_resolver.build_internal_list_chunks_runtime_spec(
             db=db,
             knowledge_base_id=knowledge_base_id,
             max_chunks=max_chunks,
             query=query,
             metadata_condition=metadata_condition,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/retrieval_service.py` around lines 729 - 740, The
code redundantly imports get_list_chunks_gateway locally and creates a new
RagRuntimeResolver instead of using the module-level imports and the instance on
the service; remove the local import and the local instantiation and use the
existing module-level get_list_chunks_gateway and the instance attribute
self.runtime_resolver (already created in __init__) when calling
build_internal_list_chunks_runtime_spec and when obtaining the gateway
(consistent with get_query_gateway usage).

knowledge_runtime/knowledge_runtime/main.py (1)

24-25: Missing return type annotation on lifespan.

Per coding guidelines, Python functions require type hints. lifespan returns an async context yielding nothing.

💡 Proposed annotation

+from collections.abc import AsyncIterator
+
 `@asynccontextmanager`
-async def lifespan(app: FastAPI):
+async def lifespan(app: FastAPI) -> AsyncIterator[None]:
     """Application lifespan handler for startup and shutdown events."""

As per coding guidelines: "Follow PEP 8, use Black formatter (line length: 88), isort for import organization, and require type hints".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/main.py` around lines 24 - 25, The
function lifespan is missing a return type hint; annotate it as returning an
asynchronous generator yielding nothing (e.g., add "-> AsyncGenerator[None,
None]" to the lifespan signature) and import the AsyncGenerator type from typing
(or typing_extensions if required), so the asynccontextmanager-decorated
function has a proper type hint; update the lifespan definition and add the
needed import while keeping the function behavior unchanged.

knowledge_runtime/start.sh (1)

244-248: Dead exit-code check under set -e.

With set -e at line 10, a non-zero exit from uv sync already aborts the script, so the if [ $? -ne 0 ] branch is unreachable. Either drop the check or invert it with if ! uv sync; for explicit handling (shellcheck SC2181).

🛠️ Suggested fix

-# Sync dependencies
-uv sync
-if [ $? -ne 0 ]; then
-    echo -e "${RED}Error: Failed to install dependencies${NC}"
-    exit 1
-fi
+# Sync dependencies
+if ! uv sync; then
+    echo -e "${RED}Error: Failed to install dependencies${NC}"
+    exit 1
+fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/start.sh` around lines 244 - 248, The post-command
exit-code check after running "uv sync" is dead because "set -e" causes the
script to exit on any non-zero status; either remove the redundant "if [ $? -ne
0 ]" block entirely or change the construct to explicitly handle failure by
running "if ! uv sync; then" and keeping the echo/exit handling; update the
block surrounding the "uv sync" invocation (and any related echo/exit lines)
accordingly so there is no unreachable code under the existing "set -e"
behavior.

docker-compose.yml (1)

263-299: Nit: knowledge_runtime depends on backend for content fetch but the backend has no healthcheck.

depends_on: backend (lines 288–289) only waits for the container to start, not for it to be ready. Since knowledge_runtime's ContentFetcher needs the backend for BackendAttachmentStreamContentRef fetches, consider adding a healthcheck on the backend service and using condition: service_healthy here — otherwise early content fetches may fail on cold-start. Non-blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose.yml` around lines 263 - 299, Add a proper healthcheck to the
backend service and make knowledge_runtime wait for that health status before
starting: define a backend healthcheck that verifies the backend readiness
endpoint (e.g., internal health route used by
ContentFetcher/BackendAttachmentStreamContentRef) and update knowledge_runtime's
depends_on to require the backend's service_healthy condition so the
ContentFetcher won't attempt early fetches on cold-start; reference the services
by name (backend, knowledge_runtime) and the components ContentFetcher and
BackendAttachmentStreamContentRef when making the changes.

knowledge_runtime/knowledge_runtime/services/content_fetcher.py (1)

213-224: Content-Disposition parsing misses RFC 5987 and multi-param headers.

The current parser will mis-handle common real-world headers like:

attachment; filename*=UTF-8''na%C3%AFve.pdf (RFC 5987 encoded form — ignored entirely).
attachment; filename="a.pdf"; size=123 — the split-on-filename= + strip-quotes leaves a.pdf"; size=123 on the wrong boundary.

Consider using email.message.Message or werkzeug/rfc6266-style parsing, or at minimum split the tail on ; and handle filename*=:

-    if "filename=" in content_disposition:
-        parts = content_disposition.split("filename=")
-        if len(parts) > 1:
-            filename = parts[1].strip('"').strip("'")
+    if "filename" in content_disposition:
+        from email.message import Message
+        msg = Message()
+        msg["content-disposition"] = content_disposition
+        filename = msg.get_filename() or ""
+        if filename:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
213 - 224, The Content-Disposition parsing in content_fetcher.py (the block that
inspects response.headers.get("Content-Disposition")) fails for RFC5987 and
multi-param headers; update the logic in that parsing block (or the surrounding
function that extracts filename/extension) to robustly parse parameters by
splitting the header into params on ";" and handling both filename and filename*
keys (use RFC5987 decoding via urllib.parse.unquote and charset/lang stripping),
and ensure you stop at param boundaries (trim trailing params like size=) before
extracting the filename and extension; alternatively replace the custom parse
with a standard parser such as email.message.Message.get_param /
cgi.parse_header or an rfc6266 helper to correctly handle quoted filenames and
encoded filename* values.

knowledge_runtime/knowledge_runtime/services/query_executor.py (1)

56-56: Sort key treats score 0 (and negatives) as missing.

r.score or 0 conflates None, 0.0, and any falsy/negative score. If a backend ever returns 0.0 or signed distances, the sort order becomes inconsistent with real ranking. Prefer an explicit None check:

-        all_records.sort(key=lambda r: r.score or 0, reverse=True)
+        all_records.sort(
+            key=lambda r: r.score if r.score is not None else float("-inf"),
+            reverse=True,
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/query_executor.py` at line 56,
The sort key in query_executor.py currently treats falsy scores (like 0.0 or
negative values) as missing; update the lambda used in the all_records.sort call
inside the relevant query execution logic to explicitly check for None and only
substitute a default when score is None (i.e., return the record's score when it
is not None, otherwise return the fallback value), so zero and negative scores
are preserved in ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 552-561: Add the missing GET /internal/rag/storage-types endpoint
to the Endpoints list in AGENTS.md so the quick reference matches the actual
routes; update the endpoints block (the list that currently contains GET
/internal/rag/health, POST /internal/rag/index, etc.) to include `GET
/internal/rag/storage-types` and, if desired, a short description like "List
available storage types" to mirror the route implemented in
knowledge_runtime/knowledge_runtime/api/endpoints/admin.py.

In `@backend/tests/services/rag/test_remote_gateway.py`:
- Around line 717-758: The test test_gateway_no_auth_header_when_token_empty is
brittle because it assumes settings.INTERNAL_SERVICE_TOKEN is empty; before
creating RemoteRagGateway set settings.INTERNAL_SERVICE_TOKEN to "" (use
monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "")) to isolate the test
and ensure RemoteRagGateway(auth_token="") does not fall back to the settings
token—mirror the approach used in
test_gateway_uses_settings_token_when_not_provided to guarantee the headers
assertion remains deterministic.

In `@docker/knowledge_runtime/Dockerfile`:
- Around line 31-42: Add an unprivileged user and switch to it before the
container starts: create a user/group (e.g., appuser), ensure /app and /app/logs
are owned and writable by that user (chown/chmod as needed), then add a USER
appuser line prior to CMD so uvicorn runs unprivileged; keep the existing
HEALTHCHECK and EXPOSE lines unchanged but ensure the health endpoint is
reachable by the non-root user.

In `@docs/plans/2026-04-19-knowledge-runtime-architecture.md`:
- Line 34: The fenced code blocks in the document containing ASCII diagrams and
flow charts are missing language identifiers (causing markdownlint MD040);
update each fenced block that contains ASCII art or flow charts to include an
appropriate language tag such as "text" or a more specific identifier (e.g.,
"graphviz" or "mermaid" if applicable) so the blocks read as ```text (or
```graphviz/```mermaid) and satisfy MD040 while preserving the diagrams'
rendering.

In `@knowledge_runtime/.env.example`:
- Around line 13-16: The dotenv keys are out of the expected lint order; move
LOG_DIR to appear before LOG_FILE_ENABLED in the .env example so keys read
LOG_DIR, LOG_FILE_ENABLED, LOG_LEVEL (update the block containing
LOG_FILE_ENABLED, LOG_DIR, LOG_LEVEL accordingly) to satisfy dotenv-linter's
expected ordering for those variables.

In `@knowledge_runtime/knowledge_runtime/config.py`:
- Around line 36-39: The current default internal_service_token = "" leaves
internal endpoints unprotected; change the config so an internal_service_token
is required at startup: update the settings model by making
internal_service_token a required field (e.g., internal_service_token: str or
internal_service_token: str = Field(...)) in the same config class so the app
fails to start if ENV INTERNAL_SERVICE_TOKEN is not set; ensure any code that
reads internal_service_token (the auth dependency that checks tokens) keeps
using that field unchanged.

In `@knowledge_runtime/knowledge_runtime/main.py`:
- Around line 51-69: The global_exception_handler currently exposes raw
exception messages via RemoteRagError (message=str(exc) and
details={"exception_type": ...}); change it to return a generic message (e.g.,
"Internal server error") and avoid sending exception_type or raw details to
callers by default; keep the full exception in server logs via logger.exception,
and only include detailed exception info in the RemoteRagError.details when
running a non-production mode (check an environment/config flag), referencing
the global_exception_handler and RemoteRagError (and be mindful that
RemoteRagGatewayError consumers will forward message verbatim).

In `@knowledge_runtime/knowledge_runtime/middleware/auth.py`:
- Around line 38-40: The current check that returns when expected_token is falsy
(variable expected_token) creates a fail-open auth hole if
INTERNAL_SERVICE_TOKEN is missing; change this to fail closed: if expected_token
is missing, respond with a 401/403 and an error log instead of returning. If you
want a dev opt-out, add an explicit env flag (e.g., DEV_AUTH_DISABLED or
ALLOW_DEV_UNAUTH) and only bypass authentication when that flag is truthy and
explicitly set, otherwise require INTERNAL_SERVICE_TOKEN; update the middleware
(the auth function referencing expected_token) to enforce this and log a clear
warning when dev opt-out is used.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 56-175: The retry on fetch() currently only retries
httpx.TransportError but both _fetch_from_presigned_url and
_fetch_from_backend_stream catch TransportError and raise ContentFetchError, so
tenacity never retries; update fetch() to retry based on
ContentFetchError.retryable by replacing the decorator’s retry argument with a
predicate (e.g., use tenacity.retry_if_exception with a helper function
_is_retryable that returns isinstance(exc, ContentFetchError) and exc.retryable)
so fetch() will retry when inner methods raise
ContentFetchError(retryable=True); keep the inner handlers raising
ContentFetchError as they are and only change the `@retry`(...) on fetch() to
reference the new predicate.

In `@knowledge_runtime/pyproject.toml`:
- Line 9: Update the requires-python spec in pyproject.toml to allow all Python
3.13 patch releases by replacing the current constraint requires-python =
">=3.10,<=3.13" with a range using an exclusive upper bound, e.g.
requires-python = ">=3.10,<3.14", so the requires-python field correctly
includes Python 3.13.x versions.

In `@knowledge_runtime/start.sh`:
- Around line 232-290: The script assumes it's run from the knowledge_runtime
directory; update start.sh so all workspace-relative operations (the
pyproject.toml existence check, creating/using .venv, copying .env.example,
computing PROJECT_ROOT, and invoking .venv/bin/python) run relative to the
script's directory: determine the script directory (using the equivalent of
BASH_SOURCE/dirname resolution), cd into that directory at the top of start.sh
before the pyproject.toml check, and ensure PROJECT_ROOT is computed relative to
that script directory (so the later export PYTHONPATH and .venv invocation
reference the correct paths).

In `@knowledge_runtime/tests/conftest.py`:
- Around line 7-31: Add explicit return type hints and move the MagicMock import
to the module top: import MagicMock from unittest.mock once at the top of the
file, then annotate the two fixtures mock_storage_backend and mock_embed_model
with return types (e.g., -> MagicMock) so the fixtures are typed; keep fixture
bodies identical but remove the inline imports and ensure isort/Black formatting
rules are followed.

In `@start.sh`:
- Around line 1586-1595: The knowledge_runtime start command uses export
INTERNAL_SERVICE_TOKEN=\$INTERNAL_SERVICE_TOKEN which can mismatch the backend
because the backend start does not export the same variable; update the backend
start invocation (the other start_service call that launches "backend") to also
export INTERNAL_SERVICE_TOKEN from the top-level environment so both services
receive the identical token, and ensure the knowledge_runtime start_service call
references the same unescaped VARIABLE (so it reads the shell/exported value)
rather than relying on its own .env; locate the start_service calls for
"knowledge_runtime" and "backend" and make the backend start command include
export INTERNAL_SERVICE_TOKEN=\$INTERNAL_SERVICE_TOKEN (or centralize the token
read at the top-level load_config and export it once before any start_service
calls).

---

Outside diff comments:
In `@AGENTS.md`:
- Around line 562-613: Update the Quick Reference ports list in AGENTS.md to
include the Knowledge Runtime default port 8200; find the "Quick Reference"
section (the code block showing start services/tests/formatting) and add "8200
(knowledge runtime)" to the Ports line so it reads something like "Ports: 3000
(frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge runtime), 3306
(MySQL), 6379 (Redis)".

In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-108: Both endpoints get_storage_retrieval_methods and
get_storage_type_retrieval_methods call gateway.get_storage_types without
handling RemoteRagGatewayError and do a one-sided lowercase comparison; wrap the
await gateway.get_storage_types() call in a try/except that catches
RemoteRagGatewayError (the same pattern used in test_retriever_connection), and
on error raise an HTTPException with a clear status_code (e.g., 502 or 424) and
the gateway error message; also make the storage type comparison robust by
lowercasing both sides (compare type_info.type.lower() to storage_type.lower())
so mixed-case names from the remote service still match.

In `@backend/app/api/endpoints/internal/rag.py`:
- Around line 235-251: The function _execute_query currently always calls
rag_gateway.query(), but when runtime_spec.route_mode == "direct_injection" it
must route to rag_gateway.list_chunks() instead; update _execute_query to first
ensure knowledge_base_configs is populated the same way as the existing branch
(use runtime_resolver.build_query_knowledge_base_configs with db,
knowledge_base_ids, user_id, user_name on runtime_spec) and then, if
getattr(runtime_spec, "route_mode", None) == "direct_injection", call and return
await rag_gateway.list_chunks(runtime_spec, db=db), otherwise call and return
await rag_gateway.query(runtime_spec, db=db); keep using get_query_gateway() and
the same runtime_spec/model_copy flow.

In `@backend/app/services/knowledge/orchestrator.py`:
- Around line 2130-2134: The docstring for retrieve_knowledge() is stale: it
claims local gateway execution and automatic fallback but the function now
routes only via the remote-only knowledge_runtime path. Update the docstring in
orchestrator.py to remove references to local gateway execution and automatic
fallback, and instead state that retrieve_knowledge() is the unified entry point
for MCP tools and the Open API that invokes the remote-only knowledge_runtime
RAG gateway path (include mention of the knowledge_runtime flow and any relevant
parameters it uses).

In `@backend/tests/integration/test_rag_remote_mode.py`:
- Line 21: Remove the now-unused import RemoteRagGatewayError from the top of
the test module (import statement "from app.services.rag.remote_gateway import
RemoteRagGatewayError") since the fallback test that referenced it was removed;
simply delete that import line to eliminate the dead/unused symbol and ensure no
other references to RemoteRagGatewayError remain in test_rag_remote_mode.py.

---

Nitpick comments:
In `@backend/app/services/rag/remote_gateway.py`:
- Around line 73-119: Both _post_model and _get duplicate the Authorization
header construction; create a small helper method (e.g., _build_headers or
_auth_headers) that returns the headers dict using self._auth_token and reuse it
from _post_model and _get (replace the inline header logic with a call to that
helper); ensure the helper lives on the same class as _post_model/_get and is
used everywhere this pattern appears so header construction is a single source
of truth.

In `@backend/app/services/rag/retrieval_service.py`:
- Around line 729-740: The code redundantly imports get_list_chunks_gateway
locally and creates a new RagRuntimeResolver instead of using the module-level
imports and the instance on the service; remove the local import and the local
instantiation and use the existing module-level get_list_chunks_gateway and the
instance attribute self.runtime_resolver (already created in __init__) when
calling build_internal_list_chunks_runtime_spec and when obtaining the gateway
(consistent with get_query_gateway usage).

In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 35-60: The function _process_custom_headers_placeholders currently
declares return type dict[str, Any] but returns the raw custom_headers on
early-exit, which can violate the contract; fix by normalizing inputs: if not
custom_headers (None/empty/False) return an empty dict{}; if custom_headers is
present but not a dict, raise a TypeError to fail fast; then proceed to build
data_sources and call build_default_headers_with_placeholders as before. This
keeps the return type stable and surfaces invalid inputs immediately.

In `@docker-compose.yml`:
- Around line 263-299: Add a proper healthcheck to the backend service and make
knowledge_runtime wait for that health status before starting: define a backend
healthcheck that verifies the backend readiness endpoint (e.g., internal health
route used by ContentFetcher/BackendAttachmentStreamContentRef) and update
knowledge_runtime's depends_on to require the backend's service_healthy
condition so the ContentFetcher won't attempt early fetches on cold-start;
reference the services by name (backend, knowledge_runtime) and the components
ContentFetcher and BackendAttachmentStreamContentRef when making the changes.

In `@knowledge_runtime/knowledge_runtime/core/logging.py`:
- Around line 33-43: Hoist the import of get_request_id out of
RequestIdFilter.filter by importing it once at module scope: attempt "from
shared.telemetry.context.span import get_request_id" at top of the module and
handle ImportError there (e.g., set a local stub get_request_id = lambda: None
or leave it None) so missing-dependency errors surface at startup; then simplify
RequestIdFilter.filter to call get_request_id() (or check if callable) and set
record.request_id = get_request_id() or "-" accordingly, removing the per-record
try/except ImportError block.

In `@knowledge_runtime/knowledge_runtime/main.py`:
- Around line 24-25: The function lifespan is missing a return type hint;
annotate it as returning an asynchronous generator yielding nothing (e.g., add
"-> AsyncGenerator[None, None]" to the lifespan signature) and import the
AsyncGenerator type from typing (or typing_extensions if required), so the
asynccontextmanager-decorated function has a proper type hint; update the
lifespan definition and add the needed import while keeping the function
behavior unchanged.

In `@knowledge_runtime/knowledge_runtime/services/admin_executor.py`:
- Line 13: Remove the unused import DocumentService from the import list in
admin_executor.py: delete the line "from
knowledge_engine.services.document_service import DocumentService" and ensure no
other references to DocumentService remain in the module (or replace any
accidental usage with the correct service). After removal, run linters/tests to
confirm no import-related errors.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 213-224: The Content-Disposition parsing in content_fetcher.py
(the block that inspects response.headers.get("Content-Disposition")) fails for
RFC5987 and multi-param headers; update the logic in that parsing block (or the
surrounding function that extracts filename/extension) to robustly parse
parameters by splitting the header into params on ";" and handling both filename
and filename* keys (use RFC5987 decoding via urllib.parse.unquote and
charset/lang stripping), and ensure you stop at param boundaries (trim trailing
params like size=) before extracting the filename and extension; alternatively
replace the custom parse with a standard parser such as
email.message.Message.get_param / cgi.parse_header or an rfc6266 helper to
correctly handle quoted filenames and encoded filename* values.

In `@knowledge_runtime/knowledge_runtime/services/query_executor.py`:
- Line 56: The sort key in query_executor.py currently treats falsy scores (like
0.0 or negative values) as missing; update the lambda used in the
all_records.sort call inside the relevant query execution logic to explicitly
check for None and only substitute a default when score is None (i.e., return
the record's score when it is not None, otherwise return the fallback value), so
zero and negative scores are preserved in ordering.

In `@knowledge_runtime/start.sh`:
- Around line 244-248: The post-command exit-code check after running "uv sync"
is dead because "set -e" causes the script to exit on any non-zero status;
either remove the redundant "if [ $? -ne 0 ]" block entirely or change the
construct to explicitly handle failure by running "if ! uv sync; then" and
keeping the echo/exit handling; update the block surrounding the "uv sync"
invocation (and any related echo/exit lines) accordingly so there is no
unreachable code under the existing "set -e" behavior.

In `@knowledge_runtime/tests/test_logging.py`:
- Around line 66-206: Tests mutate the global root logger and leak file
handlers; add an autouse pytest fixture in the test module that snapshots the
root logger and any relevant child loggers (e.g., "uvicorn.access", "uvicorn",
"fastapi", "httpx", "httpcore") before each test, clears handlers and sets
desired state, and after the test closes any file/stream handlers, restores the
original handlers, levels and propagate flags so setup_logging (and tests like
test_setup_logging_file_mode, test_setup_logging_access_log_separate_file,
test_setup_logging_access_log_console_mode) cannot leak open file descriptors or
mutated logger state across tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0bae367c-4a22-49cc-b318-978a281e5023

📥 Commits

Reviewing files that changed from the base of the PR and between 5c62da3 and bf2053a.

⛔ Files ignored due to path filters (3)

backend/uv.lock is excluded by !**/*.lock
chat_shell/uv.lock is excluded by !**/*.lock
knowledge_runtime/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (68)

.env.example
AGENTS.md
backend/.env.example
backend/app/api/endpoints/adapter/retrievers.py
backend/app/api/endpoints/internal/rag.py
backend/app/api/endpoints/rag.py
backend/app/core/config.py
backend/app/services/knowledge/orchestrator.py
backend/app/services/rag/README.md
backend/app/services/rag/embedding/__init__.py
backend/app/services/rag/embedding/factory.py
backend/app/services/rag/gateway_factory.py
backend/app/services/rag/local_data_plane/__init__.py
backend/app/services/rag/local_data_plane/administration.py
backend/app/services/rag/local_data_plane/indexing.py
backend/app/services/rag/local_data_plane/retrieval.py
backend/app/services/rag/local_gateway.py
backend/app/services/rag/remote_gateway.py
backend/app/services/rag/retrieval_service.py
backend/app/services/rag/runtime_resolver.py
backend/pyproject.toml
backend/tests/api/endpoints/internal/test_rag_retrieve_endpoint.py
backend/tests/api/endpoints/test_rag_api.py
backend/tests/core/test_config.py
backend/tests/integration/test_rag_remote_mode.py
backend/tests/services/rag/test_local_data_plane_indexing.py
backend/tests/services/rag/test_local_data_plane_retrieval.py
backend/tests/services/rag/test_local_gateway.py
backend/tests/services/rag/test_remote_gateway.py
backend/tests/services/rag/test_retrieval_service.py
chat_shell/pyproject.toml
docker-compose.build.yml
docker-compose.yml
docker/knowledge_runtime/Dockerfile
docs/plans/2026-04-19-knowledge-runtime-architecture.md
knowledge_runtime/.env.example
knowledge_runtime/knowledge_runtime/__init__.py
knowledge_runtime/knowledge_runtime/api/__init__.py
knowledge_runtime/knowledge_runtime/api/endpoints/__init__.py
knowledge_runtime/knowledge_runtime/api/endpoints/admin.py
knowledge_runtime/knowledge_runtime/api/endpoints/health.py
knowledge_runtime/knowledge_runtime/api/endpoints/index.py
knowledge_runtime/knowledge_runtime/api/endpoints/query.py
knowledge_runtime/knowledge_runtime/api/router.py
knowledge_runtime/knowledge_runtime/config.py
knowledge_runtime/knowledge_runtime/core/__init__.py
knowledge_runtime/knowledge_runtime/core/logging.py
knowledge_runtime/knowledge_runtime/main.py
knowledge_runtime/knowledge_runtime/middleware/__init__.py
knowledge_runtime/knowledge_runtime/middleware/auth.py
knowledge_runtime/knowledge_runtime/services/__init__.py
knowledge_runtime/knowledge_runtime/services/admin_executor.py
knowledge_runtime/knowledge_runtime/services/content_fetcher.py
knowledge_runtime/knowledge_runtime/services/index_executor.py
knowledge_runtime/knowledge_runtime/services/query_executor.py
knowledge_runtime/pyproject.toml
knowledge_runtime/start.sh
knowledge_runtime/tests/__init__.py
knowledge_runtime/tests/conftest.py
knowledge_runtime/tests/test_admin_executor.py
knowledge_runtime/tests/test_auth.py
knowledge_runtime/tests/test_content_fetcher.py
knowledge_runtime/tests/test_index_executor.py
knowledge_runtime/tests/test_logging.py
knowledge_runtime/tests/test_query_executor.py
shared/models/__init__.py
shared/models/knowledge_runtime_protocol.py
start.sh

💤 Files with no reviewable changes (12)

backend/app/services/rag/local_data_plane/init.py
backend/tests/core/test_config.py
chat_shell/pyproject.toml
backend/app/services/rag/embedding/init.py
backend/tests/services/rag/test_local_data_plane_retrieval.py
backend/tests/services/rag/test_local_gateway.py
backend/app/services/rag/local_data_plane/administration.py
backend/app/services/rag/local_data_plane/retrieval.py
backend/tests/services/rag/test_local_data_plane_indexing.py
backend/app/services/rag/local_data_plane/indexing.py
backend/app/services/rag/local_gateway.py
backend/app/services/rag/embedding/factory.py

coderabbitai · 2026-04-21T16:33:14Z

+@pytest.mark.asyncio
+async def test_gateway_no_auth_header_when_token_empty(mocker) -> None:
+    """Test that gateway does not add Authorization header when token is empty."""
+    post_mock = mocker.patch(
+        "httpx.AsyncClient.post",
+        return_value=_build_response(
+            url="http://knowledge-runtime/internal/rag/query",
+            status_code=200,
+            json_body={"records": [], "total": 0, "total_estimated_tokens": 0},
+        ),
+    )
+    gateway = RemoteRagGateway(
+        base_url="http://knowledge-runtime",
+        auth_token="",
+    )
+    spec = QueryRuntimeSpec(
+        knowledge_base_ids=[1],
+        query="test",
+        knowledge_base_configs=[
+            QueryKnowledgeBaseRuntimeConfig(
+                knowledge_base_id=1,
+                index_owner_user_id=1,
+                retriever_config=RuntimeRetrieverConfig(
+                    name="test",
+                    namespace="default",
+                    storage_config={"type": "elasticsearch", "url": "http://es:9200"},
+                ),
+                embedding_model_config=RuntimeEmbeddingModelConfig(
+                    model_name="test",
+                    model_namespace="default",
+                    resolved_config={"protocol": "openai", "model_id": "test-model"},
+                ),
+                retrieval_config=RuntimeRetrievalConfig(),
+            )
+        ],
    )

-    assert isinstance(get_index_gateway(), LocalRagGateway)
-    assert isinstance(get_query_gateway(), RemoteRagGateway)
-    assert isinstance(get_delete_gateway(), RemoteRagGateway)
+    await gateway.query(spec)
+
+    args, kwargs = post_mock.await_args
+    # When token is empty, headers should be empty dict
+    assert kwargs.get("headers") == {}


⚠️ Potential issue | 🟡 Minor

Potential brittleness: empty-string auth_token test relies on fallback to settings.INTERNAL_SERVICE_TOKEN being empty.

test_gateway_no_auth_header_when_token_empty does not reset settings.INTERNAL_SERVICE_TOKEN. If a developer has a non-empty value configured (or another test leaks state without teardown), the assertion kwargs.get("headers") == {} will fail depending on whether the gateway treats auth_token="" as "explicitly no auth" or falls back to the settings token. Consider using monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "") here too to isolate the test, mirroring test_gateway_uses_settings_token_when_not_provided.

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 756-756: Unpacked variable args is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/rag/test_remote_gateway.py` around lines 717 - 758, The test test_gateway_no_auth_header_when_token_empty is brittle because it assumes settings.INTERNAL_SERVICE_TOKEN is empty; before creating RemoteRagGateway set settings.INTERNAL_SERVICE_TOKEN to "" (use monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "")) to isolate the test and ensure RemoteRagGateway(auth_token="") does not fall back to the settings token—mirror the approach used in test_gateway_uses_settings_token_when_not_provided to guarantee the headers assertion remains deterministic.

coderabbitai · 2026-04-21T16:33:14Z

+# Create log directory
+RUN mkdir -p /app/logs
+
+# Expose port
+EXPOSE ${PORT}
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:${PORT}/internal/rag/health || exit 1
+
+# Start command
+CMD ["sh", "-c", "exec uvicorn knowledge_runtime.main:app --host ${HOST} --port ${PORT}"]


⚠️ Potential issue | 🟠 Major

Run the runtime container as a non-root user.

The service currently starts as root. Add an unprivileged user and ensure /app/logs remains writable before CMD.

Container hardening example

# Create log directory -RUN mkdir -p /app/logs +RUN mkdir -p /app/logs && \ + groupadd --system app && \ + useradd --system --gid app --home-dir /app --shell /usr/sbin/nologin app && \ + chown -R app:app /app # Expose port EXPOSE ${PORT} # Health check HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:${PORT}/internal/rag/health || exit 1 +USER app + # Start command CMD ["sh", "-c", "exec uvicorn knowledge_runtime.main:app --host ${HOST} --port ${PORT}"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docker/knowledge_runtime/Dockerfile` around lines 31 - 42, Add an unprivileged user and switch to it before the container starts: create a user/group (e.g., appuser), ensure /app and /app/logs are owned and writable by that user (chown/chmod as needed), then add a USER appuser line prior to CMD so uvicorn runs unprivileged; keep the existing HEALTHCHECK and EXPOSE lines unchanged but ensure the health endpoint is reachable by the non-root user.

coderabbitai · 2026-04-21T16:33:14Z

+- **服务解耦**：创建独立的 `knowledge_runtime` 服务，Backend 通过 HTTP 调用
+- **独立扩展**：RAG 计算层可独立水平扩展
+- **向后兼容**：保留 `LocalRagGateway` 作为 fallback，支持灰度切换
+- **协议透明**：使用已有的 `knowledge_runtime_protocol.py` 通信模型
+
+---
+
+## 架构概览
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                              Backend (控制面)                            │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                   │
+│  │  HTTP API    │  │  MCP Tools   │  │  Internal API│                   │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘                   │
+│         │                  │                  │                         │
+│         └──────────────────┼──────────────────┘                         │
+│                            ▼                                            │
+│                   ┌─────────────────┐                                   │
+│                   │  Orchestrator   │  ← 权限校验、请求路由、结果组装    │
+│                   └────────┬────────┘                                   │
+│                            ▼                                            │
+│                   ┌─────────────────┐                                   │
+│                   │  Gateway Factory│  ← local/remote 模式选择          │
+│                   └────────┬────────┘                                   │
+│                            │                                            │
+│         ┌──────────────────┼──────────────────┐                         │
+│         ▼                  ▼                  ▼                         │
+│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                  │
+│  │LocalRagGateway│   │RemoteRagGateway│  │  Fallback   │                  │
+│  │  (fallback)  │    │   (主路径)   │    │   Handler   │                  │
+│  └─────────────┘    └──────┬──────┘    └─────────────┘                  │


⚠️ Potential issue | 🟠 Major

Doc contradicts PR intent: still describes LocalRagGateway + RAG_RUNTIME_MODE as active components.

This PR's goal is to remove LocalRagGateway, RAG_RUNTIME_MODE, and the remote→local fallback handler (see backend/app/core/config.py, backend/app/services/rag/gateway_factory.py, and the updated tests). However, the design doc being added in the same PR still presents them as core pieces:

Line 27–28: "保留 LocalRagGateway 作为 fallback，支持灰度切换"

Line 48–56: architecture diagram includes LocalRagGateway and a Fallback Handler

Line 89: "根据 RAG_RUNTIME_MODE 选择 local/remote"

Lines 186–224: full section on RAG_RUNTIME_MODE configuration, 灰度切换 phases, and fallback flow

Lines 280–281: references backend/app/services/rag/local_gateway.py and gateway_factory.py's local/remote mode selection — the former file is deleted in this PR

Please update the document to reflect the final state (remote-only, no RAG_RUNTIME_MODE, no local fallback) or, if this is intentionally a historical design record, move it under an explicitly dated/superseded location and add a banner noting it describes an intermediate phase. As the doc stands, a future reader reconciling it against the code will be actively misled.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 34-34: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai · 2026-04-21T16:33:15Z

+
+## 架构概览
+
+```


⚠️ Potential issue | 🟡 Minor

Specify languages on fenced code blocks.

markdownlint MD040 flags the fenced blocks at lines 34, 210, and 249 for missing language identifiers. Consider using text (or a more specific identifier) for the ASCII diagrams and flow charts.

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 34-34: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-04-19-knowledge-runtime-architecture.md` at line 34, The fenced code blocks in the document containing ASCII diagrams and flow charts are missing language identifiers (causing markdownlint MD040); update each fenced block that contains ASCII art or flow charts to include an appropriate language tag such as "text" or a more specific identifier (e.g., "graphviz" or "mermaid" if applicable) so the blocks read as ```text (or ```graphviz/```mermaid) and satisfy MD040 while preserving the diagrams' rendering.

coderabbitai · 2026-04-21T16:33:15Z

+    @retry(
+        retry=retry_if_exception_type(httpx.TransportError),
+        stop=stop_after_attempt(3),
+        wait=wait_exponential(multiplier=1, min=1, max=10),
+        reraise=True,
+    )
+    async def fetch(
+        self,
+        content_ref: ContentRef,
+    ) -> tuple[bytes, str, str]:
+        """Fetch binary content from a ContentRef.
+
+        Args:
+            content_ref: The content reference to fetch.
+
+        Returns:
+            Tuple of (binary_data, source_file, file_extension).
+
+        Raises:
+            ContentFetchError: If fetching fails.
+        """
+        if isinstance(content_ref, PresignedUrlContentRef):
+            return await self._fetch_from_presigned_url(content_ref)
+        elif isinstance(content_ref, BackendAttachmentStreamContentRef):
+            return await self._fetch_from_backend_stream(content_ref)
+        else:
+            raise ContentFetchError(
+                f"Unsupported content ref type: {type(content_ref).__name__}",
+                retryable=False,
+            )
+
+    async def _fetch_from_presigned_url(
+        self,
+        content_ref: PresignedUrlContentRef,
+    ) -> tuple[bytes, str, str]:
+        """Fetch content directly from a presigned URL.
+
+        Args:
+            content_ref: Presigned URL content reference.
+
+        Returns:
+            Tuple of (binary_data, source_file, file_extension).
+        """
+        url = content_ref.url
+        logger.debug(f"Fetching content from presigned URL: {url[:100]}...")
+
+        try:
+            async with httpx.AsyncClient(
+                timeout=self._settings.content_fetch_timeout
+            ) as client:
+                response = await client.get(url)
+                response.raise_for_status()
+
+                # Extract filename and extension from URL if possible
+                source_file, file_extension = self._extract_filename_from_url(url)
+
+                return response.content, source_file, file_extension
+
+        except httpx.HTTPStatusError as exc:
+            logger.error(f"HTTP error fetching from presigned URL: {exc}")
+            raise ContentFetchError(
+                f"Failed to fetch content: HTTP {exc.response.status_code}",
+                retryable=exc.response.status_code >= 500,
+                details={"url": url[:100], "status_code": exc.response.status_code},
+            ) from exc
+        except httpx.TransportError as exc:
+            logger.error(f"Transport error fetching from presigned URL: {exc}")
+            raise ContentFetchError(
+                f"Transport error fetching content: {exc}",
+                retryable=True,
+                details={"url": url[:100]},
+            ) from exc
+
+    async def _fetch_from_backend_stream(
+        self,
+        content_ref: BackendAttachmentStreamContentRef,
+    ) -> tuple[bytes, str, str]:
+        """Fetch content through Backend attachment stream endpoint.
+
+        Args:
+            content_ref: Backend attachment stream content reference.
+
+        Returns:
+            Tuple of (binary_data, source_file, file_extension).
+        """
+        url = content_ref.url
+        auth_token = content_ref.auth_token
+
+        logger.debug(f"Fetching content from Backend stream: {url}")
+
+        try:
+            headers = {"Authorization": f"Bearer {auth_token}"}
+
+            async with httpx.AsyncClient(
+                timeout=self._settings.content_fetch_timeout
+            ) as client:
+                response = await client.get(url, headers=headers)
+                response.raise_for_status()
+
+                # Try to get filename from Content-Disposition header
+                source_file, file_extension = self._extract_filename_from_response(
+                    response, url
+                )
+
+                return response.content, source_file, file_extension
+
+        except httpx.HTTPStatusError as exc:
+            logger.error(f"HTTP error fetching from Backend stream: {exc}")
+            raise ContentFetchError(
+                f"Failed to fetch content from Backend: HTTP {exc.response.status_code}",
+                retryable=exc.response.status_code >= 500,
+                details={"url": url, "status_code": exc.response.status_code},
+            ) from exc
+        except httpx.TransportError as exc:
+            logger.error(f"Transport error fetching from Backend stream: {exc}")
+            raise ContentFetchError(
+                f"Transport error fetching content from Backend: {exc}",
+                retryable=True,
+                details={"url": url},
+            ) from exc


⚠️ Potential issue | 🟠 Major

Retry decorator on fetch() is effectively dead code.

The @retry(retry=retry_if_exception_type(httpx.TransportError), ...) on fetch() (lines 56–61) never fires because the inner methods _fetch_from_presigned_url (line 121) and _fetch_from_backend_stream (line 169) already catch httpx.TransportError and re-raise it as ContentFetchError. By the time the exception reaches the decorator, it no longer matches httpx.TransportError, so tenacity never retries.

Pick one of:

Retry on ContentFetchError with retryable=True instead.

Let httpx.TransportError propagate from the inner methods (remove the except httpx.TransportError branches) and wrap it into ContentFetchError only after retries are exhausted.

🛠️ Suggested fix (option 1)

`@retry`( - retry=retry_if_exception_type(httpx.TransportError), + retry=retry_if_exception_type(ContentFetchError) + & retry_if_result_is_retryable, # or a custom predicate stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), reraise=True, )

or, more idiomatically, use tenacity.retry_if_exception with a predicate that inspects exc.retryable:

from tenacity import retry_if_exception def _is_retryable(exc: BaseException) -> bool: return isinstance(exc, ContentFetchError) and exc.retryable `@retry`( retry=retry_if_exception(_is_retryable), stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), reraise=True, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 56 - 175, The retry on fetch() currently only retries httpx.TransportError but both _fetch_from_presigned_url and _fetch_from_backend_stream catch TransportError and raise ContentFetchError, so tenacity never retries; update fetch() to retry based on ContentFetchError.retryable by replacing the decorator’s retry argument with a predicate (e.g., use tenacity.retry_if_exception with a helper function _is_retryable that returns isinstance(exc, ContentFetchError) and exc.retryable) so fetch() will retry when inner methods raise ContentFetchError(retryable=True); keep the inner handlers raising ContentFetchError as they are and only change the `@retry`(...) on fetch() to reference the new predicate.

coderabbitai · 2026-04-21T16:33:15Z

+name = "wegent-knowledge-runtime"
+version = "1.0.0"
+description = "Wegent Knowledge Runtime HTTP Service"
+requires-python = ">=3.10,<=3.13"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python -m pip install --quiet packaging python - <<'PY' from packaging.specifiers import SpecifierSet from packaging.version import Version spec = SpecifierSet(">=3.10,<=3.13") for version in ["3.13.0", "3.13.1", "3.13.9", "3.14.0"]: print(version, Version(version) in spec) PY

Repository: wecode-ai/Wegent

Length of output: 110

Use <3.14 to include all Python 3.13 patch releases.

<=3.13 excludes versions like 3.13.1 and later patch releases; use <3.14 to support Python 3.13 generally.

Suggested change

-requires-python = ">=3.10,<=3.13" +requires-python = ">=3.10,<3.14"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/pyproject.toml` at line 9, Update the requires-python spec in pyproject.toml to allow all Python 3.13 patch releases by replacing the current constraint requires-python = ">=3.10,<=3.13" with a range using an exclusive upper bound, e.g. requires-python = ">=3.10,<3.14", so the requires-python field correctly includes Python 3.13.x versions.

coderabbitai · 2026-04-21T16:33:15Z

+import pytest
+
+
+@pytest.fixture
+def mock_storage_backend():
+    """Create a mock storage backend for testing."""
+    from unittest.mock import MagicMock
+
+    backend = MagicMock()
+    backend.test_connection.return_value = True
+    backend.retrieve.return_value = {"records": []}
+    backend.delete_document.return_value = {"status": "success", "deleted_chunks": 0}
+    backend.delete_knowledge.return_value = {"status": "success", "deleted_count": 0}
+    backend.drop_knowledge_index.return_value = {"status": "success"}
+    backend.get_all_chunks.return_value = []
+    return backend
+
+
+@pytest.fixture
+def mock_embed_model():
+    """Create a mock embedding model for testing."""
+    from unittest.mock import MagicMock
+
+    model = MagicMock()
+    return model


⚠️ Potential issue | 🟡 Minor

Add return type hints to the pytest fixtures.

Both fixture functions are untyped; annotate them and move MagicMock to a normal import.

🧪 Proposed typing fix

+from unittest.mock import MagicMock + import pytest `@pytest.fixture` -def mock_storage_backend(): +def mock_storage_backend() -> MagicMock: """Create a mock storage backend for testing.""" - from unittest.mock import MagicMock - backend = MagicMock() backend.test_connection.return_value = True backend.retrieve.return_value = {"records": []} @@ `@pytest.fixture` -def mock_embed_model(): +def mock_embed_model() -> MagicMock: """Create a mock embedding model for testing.""" - from unittest.mock import MagicMock - - model = MagicMock() - return model + return MagicMock()

As per coding guidelines, **/*.py: Follow PEP 8, use Black formatter (line length: 88), isort for import organization, and require type hints.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/tests/conftest.py` around lines 7 - 31, Add explicit return type hints and move the MagicMock import to the module top: import MagicMock from unittest.mock once at the top of the file, then annotate the two fixtures mock_storage_backend and mock_embed_model with return types (e.g., -> MagicMock) so the fixtures are typed; keep fixture bodies identical but remove the inline imports and ensure isort/Black formatting rules are followed.

- Backend 和 chat_shell 不再依赖 knowledge_engine 和 llama-index - RAG 操作统一通过 RemoteRagGateway 调用 knowledge_runtime 服务 - 更新 Docker 文件移除 knowledge_engine 复制 - 更新过时的注释说明

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

backend/app/api/endpoints/adapter/retrievers.py (2)
38-39: Add @trace_async() to the new async endpoints.

Per coding guidelines, async functions in backend/**/*.py should be decorated with @trace_async() from shared/telemetry/decorators.py. The two endpoints converted to async here (get_storage_retrieval_methods, get_storage_type_retrieval_methods) are missing it, while test_retriever_connection below is also async and similarly untraced — worth addressing consistently.

As per coding guidelines: "Use OpenTelemetry tracing decorators: @trace_async() for async functions, @trace_sync() for sync functions from shared/telemetry/decorators.py".

Also applies to: 74-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 39, The
async endpoints get_storage_retrieval_methods,
get_storage_type_retrieval_methods and test_retriever_connection are missing
OpenTelemetry tracing; import and apply the async tracing decorator by adding
`@trace_async`() (from shared/telemetry/decorators.py) directly above each async
def to ensure they are traced, and ensure the decorator import is added/merged
with existing imports if not already present.
296-296: Nit: use f-string conversion flag (RUF010).
-        logger.error(f"Retriever connection test failed: {str(e)}")
+        logger.error(f"Retriever connection test failed: {e!s}")
Same applies to the identical line at 300.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/adapter/retrievers.py` at line 296, Replace the
f-string that calls str(e) in the logger.error call with the f-string conversion
flag so the exception is logged using its repr; specifically update the
logger.error invocation that currently reads "logger.error(f\"Retriever
connection test failed: {str(e)}\")" (the two identical occurrences in
retrievers.py) to use "{e!r}" in the f-string so the log becomes more accurate
and stable.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 96-108: The comparison in the loop currently lowercases only the
input storage_type but compares it to type_info.type as-is, causing false
negatives for case differences; update the match in the loop that iterates
response.storage_types to perform a case-insensitive comparison (e.g., compare
storage_type.casefold() to type_info.type.casefold()) so supported types like
"Elasticsearch" will be detected, and ensure the returned "storage_type"
preserves the original requested value or normalized form as intended
(references: response.storage_types, type_info.type, storage_type in
retrievers.py).

---

Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-39: The async endpoints get_storage_retrieval_methods,
get_storage_type_retrieval_methods and test_retriever_connection are missing
OpenTelemetry tracing; import and apply the async tracing decorator by adding
`@trace_async`() (from shared/telemetry/decorators.py) directly above each async
def to ensure they are traced, and ensure the decorator import is added/merged
with existing imports if not already present.
- Line 296: Replace the f-string that calls str(e) in the logger.error call with
the f-string conversion flag so the exception is logged using its repr;
specifically update the logger.error invocation that currently reads
"logger.error(f\"Retriever connection test failed: {str(e)}\")" (the two
identical occurrences in retrievers.py) to use "{e!r}" in the f-string so the
log becomes more accurate and stable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e63f4cb-65c8-42f5-9139-64f36dae973c

📥 Commits

Reviewing files that changed from the base of the PR and between bf2053a and 2205a1a.

📒 Files selected for processing (6)

AGENTS.md
backend/app/api/api.py
backend/app/api/endpoints/adapter/retrievers.py
backend/app/api/endpoints/internal/__init__.py
docker/backend/Dockerfile
docker/standalone/Dockerfile

💤 Files with no reviewable changes (1)

docker/backend/Dockerfile

✅ Files skipped from review due to trivial changes (3)

backend/app/api/endpoints/internal/init.py
backend/app/api/api.py
AGENTS.md

- backend: storage type 比较改为大小写不敏感 - knowledge_runtime: 修复 content fetcher 重试逻辑 - knowledge_runtime: start.sh 修复脚本工作目录 - start.sh: backend 启动添加 INTERNAL_SERVICE_TOKEN 导出

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

start.sh (2)
1016-1055: 🛠️ Refactor suggestion | 🟠 Major

Extract service metadata instead of extending the monolithic script.

knowledge_runtime is now wired through repeated service lists, aliases, ports, dependency sync, startup, health, status, and log output. This makes each future service change require edits in many places across an already 1,700+ line script. Consider a single service registry/helper layer with fields like name, aliases, port, dir, start_cmd, and health_path.

As per coding guidelines, “File size limit: If a file exceeds 1000 lines, split it into multiple sub-modules” and “Extract common logic from ALL code.”

Also applies to: 1251-1251, 1349-1389, 1451-1706
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@start.sh` around lines 1016 - 1055, Replace the repeated per-service lists
and case branches with a single service registry data structure (e.g., an
associative array or arrays of service objects) that holds fields like name,
aliases, port, dir, start_cmd, and health_path; update the stop logic that
currently uses all_services/all_ports and the parsing case (references:
all_services, all_ports, services, service_ports, and the
knowledge_runtime/aliases) to look up services by alias and populate
services/service_ports from the registry, then refactor all other repeated
sections (dependency sync, startup, health checks, status, logs) to iterate the
same registry keys instead of duplicating code; finally, move the registry and
service-agnostic helper functions into a new sourced module to respect the
file-size guideline (split the monolithic start.sh into main + services.sh and
update callers to source the new module).
1316-1323: ⚠️ Potential issue | 🟠 Major

Add --fail flag and authentication to health check curl commands.

The current health check uses curl -s, which exits with success (code 0) for HTTP 401 and 500 responses. This masks authentication failures and service errors—the knowledge runtime health check could report healthy while the service is actually rejecting requests. Add the --fail flag to make curl fail on HTTP errors, and include the INTERNAL_SERVICE_TOKEN authorization header when configured.
Suggested fix
     echo -n "  Checking $name..."
+    local curl_args=(-fsS --connect-timeout 2)
+    if [ -n "$INTERNAL_SERVICE_TOKEN" ]; then
+        curl_args+=(-H "Authorization: Bearer $INTERNAL_SERVICE_TOKEN")
+    fi
 
     for ((i=1; i<=max_retries; i++)); do
         # Try health endpoint first if provided
         if [ -n "$health_path" ]; then
-            if curl -s --connect-timeout 2 "http://localhost:$port$health_path" >/dev/null 2>&1; then
+            if curl "${curl_args[@]}" "http://localhost:$port$health_path" >/dev/null 2>&1; then
                 echo -e " ${GREEN}✓${NC} healthy (port $port)"
                 return 0
             fi
@@
-        check_service_health "knowledge_runtime" $KNOWLEDGE_RUNTIME_PORT "/internal/rag/health" || failed=1
+        check_service_health "knowledge_runtime" "$KNOWLEDGE_RUNTIME_PORT" "/internal/rag/health" || failed=1
Also applies to: 1652-1654
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@start.sh` around lines 1316 - 1323, Update the health-check curl invocations
in the retry loop that checks "$health_path" (uses variables health_path, port,
max_retries) to pass --fail so HTTP 4xx/5xx cause a non-zero exit, and include
an Authorization header when INTERNAL_SERVICE_TOKEN is set (e.g., add -H
"Authorization: Bearer $INTERNAL_SERVICE_TOKEN" only if INTERNAL_SERVICE_TOKEN
is non-empty). Make the same change to the other health-check curl usage later
in the script (the block around the second occurrence referenced by the
reviewer). Ensure the command preserves existing flags like --connect-timeout
and silent mode.

🧹 Nitpick comments (3)

knowledge_runtime/knowledge_runtime/services/content_fetcher.py (2)
192-202: Minor: filenames from presigned URLs are not percent-decoded.

Object-storage keys with spaces or non-ASCII characters come back URL-encoded (e.g. my%20report.pdf), and the returned source_file will keep the encoding. Consider urllib.parse.unquote(filename) before returning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
192 - 202, The filename extracted from a URL (variables path, filename,
extension and the return) can be percent-encoded (e.g. "my%20report.pdf");
decode it before computing or returning the extension. Use urllib.parse.unquote
on the extracted filename (after parts[-1]) so filename becomes human-readable,
then derive extension from that decoded filename and return the decoded filename
and extension.
61-85: Missing OpenTelemetry tracing on async methods.

Per repo guidelines, async functions should be decorated with @trace_async() from shared/telemetry/decorators.py. The three async methods here (fetch, _fetch_from_presigned_url, _fetch_from_backend_stream) perform network I/O and are exactly the spans you'll want when debugging ingestion failures, but none are traced. Worth adding, plus set_span_attribute/add_span_event for content_ref.kind, status codes, and retry attempts.

As per coding guidelines: "Use OpenTelemetry tracing decorators from 'shared/telemetry/decorators.py': '@trace_async()' for async functions, '@trace_sync()' for sync functions, 'add_span_event()' for events, 'set_span_attribute()' for attributes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
61 - 85, Decorate the three async methods fetch, _fetch_from_presigned_url, and
_fetch_from_backend_stream with `@trace_async`() from
shared/telemetry/decorators.py and instrument them by calling set_span_attribute
for content_ref.kind (and any returned status/code) and add_span_event for retry
attempts and key events (e.g., request start/finish and error details); ensure
the imports for trace_async, set_span_attribute, and add_span_event are added
and that exceptions still propagate (reraise behavior unchanged) so existing
retry logic on fetch remains intact.
backend/app/api/endpoints/adapter/retrievers.py (1)
38-39: Add tracing and return annotations to the changed async endpoints.

These route handlers are now async, but they lack @trace_async() and explicit return types.
Proposed update
-from typing import Optional
+from typing import Any, Optional
@@
 from shared.models import RuntimeRetrieverConfig
+from shared.telemetry.decorators import trace_async
@@
 `@router.get`("/storage-types/retrieval-methods")
-async def get_storage_retrieval_methods():
+@trace_async()
+async def get_storage_retrieval_methods() -> dict[str, Any]:
@@
 `@router.get`("/storage-types/{storage_type}/retrieval-methods")
-async def get_storage_type_retrieval_methods(storage_type: str):
+@trace_async()
+async def get_storage_type_retrieval_methods(storage_type: str) -> dict[str, Any]:
As per coding guidelines, Python code: Follow PEP 8, use Black formatter (line length: 88), isort, and type hints are required; Use OpenTelemetry tracing decorators from 'shared/telemetry/decorators.py': '@trace_async()' for async functions, '@trace_sync()' for sync functions, 'add_span_event()' for events, 'set_span_attribute()' for attributes.

Also applies to: 74-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 39, Add
the OpenTelemetry async tracer and explicit return type annotations to the async
route handlers: decorate get_storage_retrieval_methods (and the other async
handler at the referenced lines) with `@trace_async`() from
shared.telemetry.decorators, add an appropriate explicit return type hint (for
example -> list[str] or -> dict[str, Any] depending on the actual returned
payload) to each async function signature, and import trace_async at the top of
the module; then run Black/isort to satisfy formatting and import ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Line 18: Replace direct instantiation of RemoteRagGateway() with the existing
gateway factory used by test_retriever_connection (use the same factory call
rather than new RemoteRagGateway()), and wrap calls to get_storage_types() (and
any other remote gateway calls at the three locations noted) in try/except that
catches RemoteRagGatewayError and maps it to a 503 response (e.g., raise/return
an HTTP 503 with a clear message) instead of allowing a generic 500; reference
RemoteRagGateway, RemoteRagGatewayError and get_storage_types to locate the
changes.
- Line 296: Replace f-string interpolation inside logger calls with lazy
logging: change the call using logger.error(f"Retriever connection test failed:
{str(e)}") to logger.error("Retriever connection test failed: %s", e). For the
non-logger usage mentioned (the line 301 case), avoid f-strings by building the
message into a variable first (e.g., msg = "Retriever connection test failed:
{}".format(e) or msg = "Retriever connection test failed: " + str(e)) and then
use that variable where needed (or pass it to raise/return) so no f-string is
used outside logger calls; target the logger.error call and the surrounding code
that references e.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 218-229: The Content-Disposition parsing in content_fetcher.py
using content_disposition.split("filename=") is fragile: it mishandles RFC5987
filename* and leaves trailing parameters in the extracted filename, corrupting
filename and extension that flow into index_document_from_binary via
IndexExecutor.execute; replace the hand-rolled logic around content_disposition
/ filename extraction with a robust parser (e.g.,
email.message.Message().get_param('filename', header='content-disposition') or
werkzeug.http.parse_options_header) that correctly handles filename* (RFC5987),
quoted values, and strips subsequent parameters, then derive extension from the
sanitized filename (ensure you only consider the actual filename string when
computing extension) so downstream format detection receives clean values.

In `@knowledge_runtime/start.sh`:
- Around line 58-60: The --backend-url handling currently only updates
BACKEND_URL variable used for display but does not export or persist it when an
existing .env is present, so the existing BACKEND_INTERNAL_URL in the file wins
at runtime; modify the start.sh logic that parses the --backend-url option (the
BACKEND_URL variable and the code paths around exporting BACKEND_INTERNAL_URL)
to always export BACKEND_INTERNAL_URL from the parsed BACKEND_URL value and/or
overwrite the .env entry regardless of whether .env already exists so the flag
takes effect; ensure you update the same handling used in the other occurrences
(around lines handling BACKEND_URL, BACKEND_INTERNAL_URL at 261-280 and 294-295)
so the flag is effective in all code paths.
- Around line 216-229: The startup script currently auto-installs uv by piping a
remote installer into sh when uv is missing; remove that auto-execution and
instead fail fast with clear manual install instructions. In the uv
presence-check block (the conditional using command -v uv), remove the curl ...
| sh invocation and the subsequent sourcing of $HOME/.cargo/env, and replace
them with an explicit error/warning message that prints step-by-step install
instructions (e.g., how to install via the official installer or package
manager) and then exit 1; keep the existing pre-checks and final missing-uv
error path but do not perform any network-executed installs in start.sh.
- Around line 54-56: The script parses the --python arg into PYTHON_PATH but
later invokes the virtualenv creation via the uv venv command without passing
that interpreter; update the uv venv invocation (the command that creates the
venv) to include the user-specified interpreter flag (e.g., add --python
"$PYTHON_PATH" or --python "$PYTHON_EXEC" consistent with the parser) so uv venv
uses the validated interpreter provided to --python.
- Around line 147-150: The current check uses sort -V (which fails on macOS);
replace that logic by invoking the detected python ($python_exec) to compare
REQUIRED_VERSION and PYTHON_VERSION using Python-native version parsing (e.g.,
split on '.' and compare integer tuples or use packaging/distutils if available)
and make the shell conditional depend on that process exit code; update the
block that references PYTHON_VERSION, REQUIRED_VERSION and need_exit to call
$python_exec -c "..." (return 0 if PYTHON_VERSION >= REQUIRED_VERSION, non‑zero
otherwise) and keep the existing need_exit branch unchanged so macOS-compatible
comparisons are used.

In `@start.sh`:
- Around line 58-61: The save_config logic currently removes
KNOWLEDGE_RUNTIME_PORT but leaves an existing KNOWLEDGE_RUNTIME_URL unchanged,
causing URL/port drift; update the script (the save_config/port-writing sections
that include the grep -v lines and the blocks around lines referenced) to also
remove any existing KNOWLEDGE_RUNTIME_URL entry when rewriting .env and then
write a new KNOWLEDGE_RUNTIME_URL derived from the configured
KNOWLEDGE_RUNTIME_PORT (e.g., construct
"http://localhost:${KNOWLEDGE_RUNTIME_PORT}" or use the existing host variable
if present) so the URL always matches the saved port; apply the same change to
the other similar blocks noted (around lines 81-104 and 894-905) so all places
that update KNOWLEDGE_RUNTIME_PORT also refresh KNOWLEDGE_RUNTIME_URL.

---

Outside diff comments:
In `@start.sh`:
- Around line 1016-1055: Replace the repeated per-service lists and case
branches with a single service registry data structure (e.g., an associative
array or arrays of service objects) that holds fields like name, aliases, port,
dir, start_cmd, and health_path; update the stop logic that currently uses
all_services/all_ports and the parsing case (references: all_services,
all_ports, services, service_ports, and the knowledge_runtime/aliases) to look
up services by alias and populate services/service_ports from the registry, then
refactor all other repeated sections (dependency sync, startup, health checks,
status, logs) to iterate the same registry keys instead of duplicating code;
finally, move the registry and service-agnostic helper functions into a new
sourced module to respect the file-size guideline (split the monolithic start.sh
into main + services.sh and update callers to source the new module).
- Around line 1316-1323: Update the health-check curl invocations in the retry
loop that checks "$health_path" (uses variables health_path, port, max_retries)
to pass --fail so HTTP 4xx/5xx cause a non-zero exit, and include an
Authorization header when INTERNAL_SERVICE_TOKEN is set (e.g., add -H
"Authorization: Bearer $INTERNAL_SERVICE_TOKEN" only if INTERNAL_SERVICE_TOKEN
is non-empty). Make the same change to the other health-check curl usage later
in the script (the block around the second occurrence referenced by the
reviewer). Ensure the command preserves existing flags like --connect-timeout
and silent mode.

---

Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-39: Add the OpenTelemetry async tracer and explicit return type
annotations to the async route handlers: decorate get_storage_retrieval_methods
(and the other async handler at the referenced lines) with `@trace_async`() from
shared.telemetry.decorators, add an appropriate explicit return type hint (for
example -> list[str] or -> dict[str, Any] depending on the actual returned
payload) to each async function signature, and import trace_async at the top of
the module; then run Black/isort to satisfy formatting and import ordering.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 192-202: The filename extracted from a URL (variables path,
filename, extension and the return) can be percent-encoded (e.g.
"my%20report.pdf"); decode it before computing or returning the extension. Use
urllib.parse.unquote on the extracted filename (after parts[-1]) so filename
becomes human-readable, then derive extension from that decoded filename and
return the decoded filename and extension.
- Around line 61-85: Decorate the three async methods fetch,
_fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`()
from shared/telemetry/decorators.py and instrument them by calling
set_span_attribute for content_ref.kind (and any returned status/code) and
add_span_event for retry attempts and key events (e.g., request start/finish and
error details); ensure the imports for trace_async, set_span_attribute, and
add_span_event are added and that exceptions still propagate (reraise behavior
unchanged) so existing retry logic on fetch remains intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 02f0511a-8df5-49f5-9ac3-70f35d37cebe

📥 Commits

Reviewing files that changed from the base of the PR and between 2205a1a and 88f36d5.

📒 Files selected for processing (4)

backend/app/api/endpoints/adapter/retrievers.py
knowledge_runtime/knowledge_runtime/services/content_fetcher.py
knowledge_runtime/start.sh
start.sh

coderabbitai · 2026-04-21T17:02:42Z

 from app.services.rag.gateway_factory import get_query_gateway
-from app.services.rag.local_gateway import LocalRagGateway
-from app.services.rag.remote_gateway import RemoteRagGatewayError
+from app.services.rag.remote_gateway import RemoteRagGateway, RemoteRagGatewayError


⚠️ Potential issue | 🟠 Major

Use the gateway factory and map remote storage-type failures.

Directly constructing RemoteRagGateway() bypasses the singleton/factory path already used by test_retriever_connection. Also, get_storage_types() failures currently bubble as generic 500s; map RemoteRagGatewayError to a clear 503.

Proposed update

-from app.services.rag.remote_gateway import RemoteRagGateway, RemoteRagGatewayError +from app.services.rag.remote_gateway import RemoteRagGatewayError @@ def _check_rag_available(): @@ ) + + +async def _get_storage_types_or_503(): + """Fetch storage types from knowledge_runtime or map failures to 503.""" + try: + return await get_query_gateway().get_storage_types() + except RemoteRagGatewayError as exc: + logger.warning( + "Failed to fetch storage types from knowledge_runtime", + extra={"error_type": type(exc).__name__}, + ) + raise HTTPException( + status_code=503, + detail="RAG knowledge runtime is unavailable", + ) from exc @@ - gateway = RemoteRagGateway() - response = await gateway.get_storage_types() + response = await _get_storage_types_or_503() @@ - gateway = RemoteRagGateway() - response = await gateway.get_storage_types() + response = await _get_storage_types_or_503()

Also applies to: 58-59, 93-94

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` at line 18, Replace direct instantiation of RemoteRagGateway() with the existing gateway factory used by test_retriever_connection (use the same factory call rather than new RemoteRagGateway()), and wrap calls to get_storage_types() (and any other remote gateway calls at the three locations noted) in try/except that catches RemoteRagGatewayError and maps it to a 503 response (e.g., raise/return an HTTP 503 with a clear message) instead of allowing a generic 500; reference RemoteRagGateway, RemoteRagGatewayError and get_storage_types to locate the changes.

coderabbitai · 2026-04-21T17:02:43Z

        return {"success": False, "message": str(e)}

+    except RemoteRagGatewayError as e:
+        logger.error(f"Retriever connection test failed: {str(e)}")


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify Ruff no longer reports RUF010 for this file. uv run ruff check backend/app/api/endpoints/adapter/retrievers.py --select RUF010

Repository: wecode-ai/Wegent

Length of output: 102

🏁 Script executed:

# First, examine the specific file and line cat -n backend/app/api/endpoints/adapter/retrievers.py | sed -n '290,305p'

Repository: wecode-ai/Wegent

Length of output: 618

🏁 Script executed:

# Check if ruff is available as a direct command which ruff && ruff --version

Repository: wecode-ai/Wegent

Length of output: 105

🏁 Script executed:

# If ruff is available, check for RUF010 violations in the file ruff check backend/app/api/endpoints/adapter/retrievers.py --select RUF010 2>&1 || echo "Command failed or no violations found"

Repository: wecode-ai/Wegent

Length of output: 1868

🏁 Script executed:

# Get more context around line 301 to understand the fix needed cat -n backend/app/api/endpoints/adapter/retrievers.py | sed -n '299,302p'

Repository: wecode-ai/Wegent

Length of output: 252

Fix the Ruff RUF010 logging warnings.

Use lazy logging for logger calls instead of f-string interpolation. Line 301 requires a different approach since it's not a logger call.

Proposed updates

except RemoteRagGatewayError as e: - logger.error(f"Retriever connection test failed: {str(e)}") + logger.error("Retriever connection test failed: %s", e) return {"success": False, "message": str(e)} except Exception as e: - logger.error(f"Retriever connection test failed: {str(e)}") - return {"success": False, "message": f"Connection failed: {str(e)}"} + logger.error("Retriever connection test failed: %s", e) + return {"success": False, "message": f"Connection failed: {e!s}"}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.error(f"Retriever connection test failed: {str(e)}")

except RemoteRagGatewayError as e:

logger.error("Retriever connection test failed: %s", e)

return {"success": False, "message": str(e)}

except Exception as e:

logger.error("Retriever connection test failed: %s", e)

return {"success": False, "message": f"Connection failed: {e!s}"}

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 296-296: Use explicit conversion flag

Replace with conversion flag

(RUF010)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` at line 296, Replace f-string interpolation inside logger calls with lazy logging: change the call using logger.error(f"Retriever connection test failed: {str(e)}") to logger.error("Retriever connection test failed: %s", e). For the non-logger usage mentioned (the line 301 case), avoid f-strings by building the message into a variable first (e.g., msg = "Retriever connection test failed: {}".format(e) or msg = "Retriever connection test failed: " + str(e)) and then use that variable where needed (or pass it to raise/return) so no f-string is used outside logger calls; target the logger.error call and the surrounding code that references e.

coderabbitai · 2026-04-21T17:02:43Z

+        --backend-url)
+            BACKEND_URL="$2"
+            shift 2


⚠️ Potential issue | 🟠 Major

Apply --backend-url even when .env already exists.

When .env exists, --backend-url only changes the displayed value; it is never exported as BACKEND_INTERNAL_URL and the existing .env value wins at runtime. This makes the CLI flag ineffective for the common case.

Suggested fix

else echo -e "${GREEN}✓ .env file already exists${NC}" fi + +# Command-line value should override .env for this process. +export BACKEND_INTERNAL_URL="$BACKEND_URL" # Set PYTHONPATH to include project root PROJECT_ROOT=$(cd .. && pwd)

Also applies to: 261-280, 294-295

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 58 - 60, The --backend-url handling currently only updates BACKEND_URL variable used for display but does not export or persist it when an existing .env is present, so the existing BACKEND_INTERNAL_URL in the file wins at runtime; modify the start.sh logic that parses the --backend-url option (the BACKEND_URL variable and the code paths around exporting BACKEND_INTERNAL_URL) to always export BACKEND_INTERNAL_URL from the parsed BACKEND_URL value and/or overwrite the .env entry regardless of whether .env already exists so the flag takes effect; ensure you update the same handling used in the other occurrences (around lines handling BACKEND_URL, BACKEND_INTERNAL_URL at 261-280 and 294-295) so the flag is effective in all code paths.

coderabbitai · 2026-04-21T17:02:43Z

+    PYTHON_VERSION=$($python_exec --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2)
+
+    if [ "$(printf '%s\n' "$REQUIRED_VERSION" "$PYTHON_VERSION" | sort -V | head -n1)" != "$REQUIRED_VERSION" ]; then
+        if [ "$need_exit" = "true" ]; then


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verification: run on macOS and Linux to confirm whether the current sort supports -V. # Expected on macOS/BSD sort: non-zero exit or "illegal option" output. printf '%s\n' "3.10" "3.11" | sort -V

Repository: wecode-ai/Wegent

Length of output: 69

🏁 Script executed:

# First, let's look at the file structure cd knowledge_runtime && ls -la

Repository: wecode-ai/Wegent

Length of output: 559

🏁 Script executed:

# Check the exact content around lines 147-150 head -n 160 knowledge_runtime/start.sh | tail -n 30

Repository: wecode-ai/Wegent

Length of output: 1064

🏁 Script executed:

# Search for macOS-specific handling in the script rg -n "macOS|darwin|Darwin|uname" knowledge_runtime/start.sh

Repository: wecode-ai/Wegent

Length of output: 125

🌐 Web query:

BSD sort macOS -V flag version compatibility 2024 2025

💡 Result:

macOS's BSD sort (/usr/bin/sort) does not support the -V (version sort) flag in 2024, 2025, or as of 2026-04-21. This GNU coreutils extension for natural sorting of version numbers (e.g., 1.10 before 1.2) is absent from BSD implementations. Man pages for macOS sort (OSX 10.5-10.6 and general) list options like -b, -d, -f, -g, -i, -M, -n, -r, -k, -t, -z, but no -V or --version-sort. Stack Overflow discussions from 2010-2022 confirm -V fails on macOS, recommending workarounds like multi-key numeric sorts (e.g., sort -t. -k1,1n -k2,2n -k3,3n) or GNU sort via Homebrew (brew install coreutils; gsort -V). No evidence of Apple adding -V in Sequoia (15, 2024), Sonoma (14, 2023), or later versions. FreeBSD sort man page (macOS base) explicitly includes -V as a non-POSIX extension, but Apple's version omits it. Use gsort from coreutils for -V compatibility.

Citations:

1: https://stackoverflow.com/questions/4265284/how-to-do-sort-v-in-osx

2: https://www.unix.com/man-page/osx/1/sort?os=osx&section=1&query=sort

3: https://superuser.com/questions/1087057/sort-v-on-mac-os-not-working

4: https://stackoverflow.com/questions/21394536/how-to-simulate-sort-v-on-macos

5: https://www.unix.com/man-page/osx/1/sort/

6: https://manp.gs/mac/1/sort

7: https://www.manpagez.com/man/1/sort/osx-10.5.3.php

8: https://www.manpagez.com/man/1/sort/osx-10.6.php

9: https://www.ss64.com/osx/sort.html

10: https://apple.stackexchange.com/questions/381772/is-there-a-way-to-change-the-order-in-which-the-finder-alphabetizes

11: https://apple.stackexchange.com/questions/459986/to-sort-files-case-sensitively

12: https://www.brew.sh

🏁 Script executed:

# Check more context around the function to understand the environment head -n 200 knowledge_runtime/start.sh | tail -n 80

Repository: wecode-ai/Wegent

Length of output: 2997

🏁 Script executed:

# Verify if Python path is already available at this point in the script grep -n "PYTHON_PATH\|python_exec\|python" knowledge_runtime/start.sh | head -n 20

Repository: wecode-ai/Wegent

Length of output: 1137

Use Python for version comparison instead of sort -V to ensure macOS compatibility.

The script explicitly handles macOS but relies on sort -V at line 148, which is unsupported by BSD sort on macOS. This causes valid Python installations to be rejected on macOS. Replace the sort-based comparison with Python's native version comparison:

Suggested fix

- PYTHON_VERSION=$($python_exec --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2) + PYTHON_VERSION=$("$python_exec" -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")') - if [ "$(printf '%s\n' "$REQUIRED_VERSION" "$PYTHON_VERSION" | sort -V | head -n1)" != "$REQUIRED_VERSION" ]; then + if "$python_exec" -c "import sys; required = tuple(map(int, '$REQUIRED_VERSION'.split('.'))); sys.exit(0 if sys.version_info[:2] >= required else 1)"; then + return 0 + else if [ "$need_exit" = "true" ]; then echo -e "${RED}Error: Python $REQUIRED_VERSION or higher is required (found $PYTHON_VERSION)${NC}" exit 1 fi return 1 + fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 147 - 150, The current check uses sort -V (which fails on macOS); replace that logic by invoking the detected python ($python_exec) to compare REQUIRED_VERSION and PYTHON_VERSION using Python-native version parsing (e.g., split on '.' and compare integer tuples or use packaging/distutils if available) and make the shell conditional depend on that process exit code; update the block that references PYTHON_VERSION, REQUIRED_VERSION and need_exit to call $python_exec -c "..." (return 0 if PYTHON_VERSION >= REQUIRED_VERSION, non‑zero otherwise) and keep the existing need_exit branch unchanged so macOS-compatible comparisons are used.

coderabbitai · 2026-04-21T17:02:43Z

+if ! command -v uv &> /dev/null; then
+    echo -e "${YELLOW}Warning: uv is not installed${NC}"
+    echo -e "${YELLOW}Installing uv...${NC}"
+    curl -LsSf https://astral.sh/uv/install.sh | sh
+
+    if [ -f "$HOME/.cargo/env" ]; then
+        source "$HOME/.cargo/env"
+    fi
+
+    if ! command -v uv &> /dev/null; then
+        echo -e "${RED}Error: Failed to install uv${NC}"
+        exit 1
+    fi
+fi


⚠️ Potential issue | 🟠 Major

Avoid auto-executing a remote installer from the startup script.

curl ... | sh executes network-fetched code during service startup. This is a supply-chain risk and is inconsistent with the root script, which prints install instructions instead of auto-installing. Prefer failing with explicit instructions.

Suggested fix

if ! command -v uv &> /dev/null; then - echo -e "${YELLOW}Warning: uv is not installed${NC}" - echo -e "${YELLOW}Installing uv...${NC}" - curl -LsSf https://astral.sh/uv/install.sh | sh - - if [ -f "$HOME/.cargo/env" ]; then - source "$HOME/.cargo/env" - fi - - if ! command -v uv &> /dev/null; then - echo -e "${RED}Error: Failed to install uv${NC}" - exit 1 - fi + echo -e "${RED}Error: uv is not installed${NC}" + echo -e "${YELLOW}Install uv, then rerun this script:${NC}" + echo -e " ${BLUE}curl -LsSf https://astral.sh/uv/install.sh | sh${NC}" + exit 1 fi

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 222-222: Not following: ./.cargo/env was not specified as input (see shellcheck -x).

(SC1091)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 216 - 229, The startup script currently auto-installs uv by piping a remote installer into sh when uv is missing; remove that auto-execution and instead fail fast with clear manual install instructions. In the uv presence-check block (the conditional using command -v uv), remove the curl ... | sh invocation and the subsequent sourcing of $HOME/.cargo/env, and replace them with an explicit error/warning message that prints step-by-step install instructions (e.g., how to install via the official installer or package manager) and then exit 1; keep the existing pre-checks and final missing-uv error path but do not perform any network-executed installs in start.sh.

coderabbitai · 2026-04-21T17:02:43Z

+        grep -v "^KNOWLEDGE_RUNTIME_PORT=" | \
        grep -v "^WEGENT_FRONTEND_PORT=" | \
        grep -v "^EXECUTOR_IMAGE=" | \
        grep -v "^WEGENT_SOCKET_URL=" > "$temp_file" || true


⚠️ Potential issue | 🟠 Major

Keep KNOWLEDGE_RUNTIME_URL in sync with the configured port.

save_config writes KNOWLEDGE_RUNTIME_PORT but preserves any existing KNOWLEDGE_RUNTIME_URL, and CLI -k/--knowledge-runtime-port only changes the port. If .env already contains KNOWLEDGE_RUNTIME_URL=http://localhost:8200 and the user switches to port 8201, the runtime starts on 8201 while the backend still calls 8200.

Suggested fix

grep -v "^EXECUTOR_MANAGER_PORT=" | \ grep -v "^KNOWLEDGE_RUNTIME_PORT=" | \ + grep -v "^KNOWLEDGE_RUNTIME_URL=" | \ grep -v "^WEGENT_FRONTEND_PORT=" | \ @@ KNOWLEDGE_RUNTIME_PORT=$KNOWLEDGE_RUNTIME_PORT +KNOWLEDGE_RUNTIME_URL=http://localhost:$KNOWLEDGE_RUNTIME_PORT WEGENT_FRONTEND_PORT=$WEGENT_FRONTEND_PORT @@ echo "EXECUTOR_MANAGER_PORT=$EXECUTOR_MANAGER_PORT" >> "$temp_file" echo "KNOWLEDGE_RUNTIME_PORT=$KNOWLEDGE_RUNTIME_PORT" >> "$temp_file" + echo "KNOWLEDGE_RUNTIME_URL=http://localhost:$KNOWLEDGE_RUNTIME_PORT" >> "$temp_file" echo "WEGENT_FRONTEND_PORT=$WEGENT_FRONTEND_PORT" >> "$temp_file" @@ -[ -n "$CLI_KNOWLEDGE_RUNTIME_PORT" ] && KNOWLEDGE_RUNTIME_PORT="$CLI_KNOWLEDGE_RUNTIME_PORT" +if [ -n "$CLI_KNOWLEDGE_RUNTIME_PORT" ]; then + KNOWLEDGE_RUNTIME_PORT="$CLI_KNOWLEDGE_RUNTIME_PORT" + KNOWLEDGE_RUNTIME_URL="" +fi

Also applies to: 81-104, 894-905

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@start.sh` around lines 58 - 61, The save_config logic currently removes KNOWLEDGE_RUNTIME_PORT but leaves an existing KNOWLEDGE_RUNTIME_URL unchanged, causing URL/port drift; update the script (the save_config/port-writing sections that include the grep -v lines and the blocks around lines referenced) to also remove any existing KNOWLEDGE_RUNTIME_URL entry when rewriting .env and then write a new KNOWLEDGE_RUNTIME_URL derived from the configured KNOWLEDGE_RUNTIME_PORT (e.g., construct "http://localhost:${KNOWLEDGE_RUNTIME_PORT}" or use the existing host variable if present) so the URL always matches the saved port; apply the same change to the other similar blocks noted (around lines 81-104 and 894-905) so all places that update KNOWLEDGE_RUNTIME_PORT also refresh KNOWLEDGE_RUNTIME_URL.

- 使用 pyrfc6266 替代手写正则解析 Content-Disposition - 正确处理 RFC 5987 filename* 参数 (UTF-8 编码文件名) - 修复 start.sh 中 venv 创建时使用指定的 Python 版本 - 添加 pyrfc6266 依赖

…ests asyncio.to_thread expects sync functions, not coroutines. Using AsyncMock returns a coroutine object instead of the actual return value, causing TypeError: 'coroutine' object is not subscriptable errors.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

knowledge_runtime/start.sh (3)
109-109: Quote $port in lsof invocation (SC2086).

Minor hygiene flagged by shellcheck; $port is validated but the unquoted expansion is still best avoided.
🛠️ Proposed fix
-    if lsof -Pi :$port -sTCP:LISTEN -t >/dev/null 2>&1; then
+    if lsof -Pi ":$port" -sTCP:LISTEN -t >/dev/null 2>&1; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/start.sh` at line 109, The lsof invocation uses an unquoted
variable expansion (the $port in the condition) which triggers SC2086; update
the lsof call to quote the port variable (use "$port") in the expression that
checks for a listening socket so the expansion is safe even if the value
contains spaces or glob characters—locate the conditional containing lsof -Pi
:$port -sTCP:LISTEN -t and replace the unquoted $port with a quoted "$port".
282-285: Avoid leading : in PYTHONPATH when unset.

If PYTHONPATH is empty on the invoking shell (the common case), the result is :${PROJECT_ROOT}, which Python interprets as including the current working directory on sys.path. Guard the separator:
🛠️ Proposed fix
-PROJECT_ROOT=$(cd .. && pwd)
-export PYTHONPATH="${PYTHONPATH}:${PROJECT_ROOT}"
+PROJECT_ROOT=$(cd .. && pwd)
+export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}${PROJECT_ROOT}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/start.sh` around lines 282 - 285, The current startup
snippet sets PYTHONPATH by appending with a leading colon which yields
":{PROJECT_ROOT}" when PYTHONPATH is empty; modify the logic in start.sh around
the PROJECT_ROOT and PYTHONPATH export so you only prepend a colon when
PYTHONPATH is non-empty (e.g., test [ -z "$PYTHONPATH" ] and export
PYTHONPATH="$PROJECT_ROOT" else export
PYTHONPATH="${PYTHONPATH}:$PROJECT_ROOT"), ensuring the echo message still
prints the resolved PROJECT_ROOT variable.
249-253: Dead $? check — set -e already aborts on failure.

With set -e at line 10, uv sync failing terminates the script before reaching line 250, so the if [ $? -ne 0 ] block is unreachable (also SC2181). Either drop the check or run uv sync inside an if ! ...; then guarded block if you want a custom message.
🛠️ Proposed fix
-# Sync dependencies
-uv sync
-if [ $? -ne 0 ]; then
-    echo -e "${RED}Error: Failed to install dependencies${NC}"
-    exit 1
-fi
+# Sync dependencies
+if ! uv sync; then
+    echo -e "${RED}Error: Failed to install dependencies${NC}"
+    exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/start.sh` around lines 249 - 253, The post-run `$?` check
after running `uv sync` is unreachable because the script uses `set -e`; update
the `uv sync` invocation to either remove the `if [ $? -ne 0 ]` block entirely
or replace the plain call plus `$?` check with an explicit guarded form like `if
! uv sync; then ...` so you can emit the custom error message and exit; locate
the call to `uv sync` in start.sh (and the surrounding `set -e` at the top) and
apply the chosen fix to remove the dead `$?` check or wrap the `uv sync` in an
`if ! ...; then` guard.
knowledge_runtime/knowledge_runtime/services/content_fetcher.py (2)
68-86: Missing tracing decorators on async methods.

Per coding guidelines, async functions should be annotated with @trace_async(). fetch, _fetch_from_presigned_url, and _fetch_from_backend_stream are all async and part of a new service on the data path — likely the most useful place to have spans for troubleshooting. Consider adding the decorators (and add_span_event/set_span_attribute for URL/status details) unless there's a reason this module is intentionally excluded from tracing.

As per coding guidelines: "Use @trace_async() decorator for entire async functions, @trace_sync() for entire sync functions, add_span_event() to add events, set_span_attribute() to set attributes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
68 - 86, Add tracing to the async content fetchers: decorate fetch,
_fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`(),
and instrument them to emit useful span details by calling add_span_event() for
notable events (e.g., "fetch_started", "presigned_url_requested",
"backend_stream_opened") and set_span_attribute() for key attributes (e.g.,
request URL, HTTP status, content_ref.id). Ensure the decorators are imported
from the tracing utilities and place span events/attributes near network calls
and on error paths inside the named functions to capture URL/status/error
details for troubleshooting.
62-67: Nit: extract retry-policy magic numbers to module constants.

stop_after_attempt(3) and wait_exponential(multiplier=1, min=1, max=10) are tuning knobs that are easier to reason about (and tweak per deployment) when pulled out as named constants — also aligns with the guideline to extract magic numbers.
+_MAX_FETCH_ATTEMPTS = 3
+_RETRY_WAIT_MIN_SECONDS = 1
+_RETRY_WAIT_MAX_SECONDS = 10
...
-        stop=stop_after_attempt(3),
-        wait=wait_exponential(multiplier=1, min=1, max=10),
+        stop=stop_after_attempt(_MAX_FETCH_ATTEMPTS),
+        wait=wait_exponential(
+            multiplier=1, min=_RETRY_WAIT_MIN_SECONDS, max=_RETRY_WAIT_MAX_SECONDS
+        ),
As per coding guidelines: "extract magic numbers to constants in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
62 - 67, Extract the numeric retry-policy literals into named module-level
constants and use them in the retry decorator: define constants like
RETRY_ATTEMPTS = 3, WAIT_MULTIPLIER = 1, WAIT_MIN = 1, WAIT_MAX = 10 at the top
of the module and replace stop_after_attempt(3) and
wait_exponential(multiplier=1, min=1, max=10) with
stop_after_attempt(RETRY_ATTEMPTS) and
wait_exponential(multiplier=WAIT_MULTIPLIER, min=WAIT_MIN, max=WAIT_MAX); keep
the same retry wrapper and _is_retryable reference unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 183-203: The _extract_filename_from_url function currently splits
the URL by "?" and leaves percent-encoding in the filename; update it to parse
the URL path using urllib.parse.urlparse(url).path, take the last path segment,
then percent-decode that segment with urllib.parse.unquote before extracting the
extension (keep the existing extension logic using rsplit(".", 1)). Ensure you
still handle trailing slashes and return a sensible default like "unknown" when
no filename segment exists.

In `@knowledge_runtime/start.sh`:
- Around line 294-295: The startup script currently hardcodes uvicorn --reload
in the line invoking ".venv/bin/python -m uvicorn knowledge_runtime.main:app
--host \"$HOST\" --port \"$PORT\" --reload"; remove the unconditional --reload
and make it opt-in by checking an environment flag or script argument (for
example DEV=true or a --dev flag) and only appending "--reload" when that flag
is set; update start.sh to build the uvicorn command conditionally so
production/default runs do not use --reload but dev runs can enable it.

---

Nitpick comments:
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 68-86: Add tracing to the async content fetchers: decorate fetch,
_fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`(),
and instrument them to emit useful span details by calling add_span_event() for
notable events (e.g., "fetch_started", "presigned_url_requested",
"backend_stream_opened") and set_span_attribute() for key attributes (e.g.,
request URL, HTTP status, content_ref.id). Ensure the decorators are imported
from the tracing utilities and place span events/attributes near network calls
and on error paths inside the named functions to capture URL/status/error
details for troubleshooting.
- Around line 62-67: Extract the numeric retry-policy literals into named
module-level constants and use them in the retry decorator: define constants
like RETRY_ATTEMPTS = 3, WAIT_MULTIPLIER = 1, WAIT_MIN = 1, WAIT_MAX = 10 at the
top of the module and replace stop_after_attempt(3) and
wait_exponential(multiplier=1, min=1, max=10) with
stop_after_attempt(RETRY_ATTEMPTS) and
wait_exponential(multiplier=WAIT_MULTIPLIER, min=WAIT_MIN, max=WAIT_MAX); keep
the same retry wrapper and _is_retryable reference unchanged.

In `@knowledge_runtime/start.sh`:
- Line 109: The lsof invocation uses an unquoted variable expansion (the $port
in the condition) which triggers SC2086; update the lsof call to quote the port
variable (use "$port") in the expression that checks for a listening socket so
the expansion is safe even if the value contains spaces or glob
characters—locate the conditional containing lsof -Pi :$port -sTCP:LISTEN -t and
replace the unquoted $port with a quoted "$port".
- Around line 282-285: The current startup snippet sets PYTHONPATH by appending
with a leading colon which yields ":{PROJECT_ROOT}" when PYTHONPATH is empty;
modify the logic in start.sh around the PROJECT_ROOT and PYTHONPATH export so
you only prepend a colon when PYTHONPATH is non-empty (e.g., test [ -z
"$PYTHONPATH" ] and export PYTHONPATH="$PROJECT_ROOT" else export
PYTHONPATH="${PYTHONPATH}:$PROJECT_ROOT"), ensuring the echo message still
prints the resolved PROJECT_ROOT variable.
- Around line 249-253: The post-run `$?` check after running `uv sync` is
unreachable because the script uses `set -e`; update the `uv sync` invocation to
either remove the `if [ $? -ne 0 ]` block entirely or replace the plain call
plus `$?` check with an explicit guarded form like `if ! uv sync; then ...` so
you can emit the custom error message and exit; locate the call to `uv sync` in
start.sh (and the surrounding `set -e` at the top) and apply the chosen fix to
remove the dead `$?` check or wrap the `uv sync` in an `if ! ...; then` guard.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e3c23ef-3488-4725-b899-bd326bb05bf7

📥 Commits

Reviewing files that changed from the base of the PR and between 88f36d5 and 6598317.

⛔ Files ignored due to path filters (1)

knowledge_runtime/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

knowledge_runtime/knowledge_runtime/services/content_fetcher.py
knowledge_runtime/pyproject.toml
knowledge_runtime/start.sh
knowledge_runtime/tests/test_admin_executor.py

✅ Files skipped from review due to trivial changes (2)

knowledge_runtime/tests/test_admin_executor.py
knowledge_runtime/pyproject.toml

coderabbitai · 2026-04-22T04:18:45Z

+    def _extract_filename_from_url(self, url: str) -> tuple[str, str]:
+        """Extract filename and extension from URL path.
+
+        Args:
+            url: The URL to extract from.
+
+        Returns:
+            Tuple of (filename, extension).
+        """
+        # Try to get the last path segment
+        path = url.split("?")[0]  # Remove query params
+        parts = path.rstrip("/").split("/")
+        filename = parts[-1] if parts else "unknown"
+
+        # Extract extension
+        if "." in filename:
+            extension = "." + filename.rsplit(".", 1)[-1]
+        else:
+            extension = ""
+
+        return filename, extension


⚠️ Potential issue | 🟡 Minor

URL-path filename is not percent-decoded.

For presigned URLs whose object key contains encoded characters (e.g. /bucket/my%20report.pdf or non-ASCII names), source_file will be stored with the percent-encoding intact (my%20report.pdf). The extension detection still works for common cases, but the persisted source_file that flows into IndexExecutor.execute() will be a mangled display name, and filenames whose encoding covers the extension separator (%2E) will lose their extension entirely.

Consider urllib.parse.unquote on the last path segment, and using urllib.parse.urlparse(url).path instead of manual split("?")[0] to also strip fragments and be robust to auth/userinfo components.

🛠️ Suggested tweak

- # Try to get the last path segment - path = url.split("?")[0] # Remove query params - parts = path.rstrip("/").split("/") - filename = parts[-1] if parts else "unknown" + from urllib.parse import unquote, urlparse + + path = urlparse(url).path + parts = path.rstrip("/").split("/") + filename = unquote(parts[-1]) if parts and parts[-1] else "unknown"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 183 - 203, The _extract_filename_from_url function currently splits the URL by "?" and leaves percent-encoding in the filename; update it to parse the URL path using urllib.parse.urlparse(url).path, take the last path segment, then percent-decode that segment with urllib.parse.unquote before extracting the extension (keep the existing extension logic using rsplit(".", 1)). Ensure you still handle trailing slashes and return a sensible default like "unknown" when no filename segment exists.

coderabbitai · 2026-04-22T04:18:46Z

+# Start uvicorn
+.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" --reload


⚠️ Potential issue | 🟠 Major

Remove unconditional --reload; this is a dev-only flag.

uvicorn --reload spawns a file-watcher process and reloads workers on any source change. That's inappropriate as the default for a "one-click startup" script that will also be used for local smoke tests and container/service deployments: it increases memory, interacts poorly with process-manager signal handling, and can cause surprise restarts while running. Make it opt-in via a flag (e.g., --dev) or remove it.

🛠️ Proposed fix

+# Parse --dev flag earlier in the arg loop; default off +RELOAD_FLAG="" +if [ "${DEV_MODE:-false}" = "true" ]; then + RELOAD_FLAG="--reload" +fi + # Start uvicorn -.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" --reload +.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" $RELOAD_FLAG

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 294 - 295, The startup script currently hardcodes uvicorn --reload in the line invoking ".venv/bin/python -m uvicorn knowledge_runtime.main:app --host \"$HOST\" --port \"$PORT\" --reload"; remove the unconditional --reload and make it opt-in by checking an environment flag or script argument (for example DEV=true or a --dev flag) and only appending "--reload" when that flag is set; update start.sh to build the uvicorn command conditionally so production/default runs do not use --reload but dev runs can enable it.

sunnights and others added 13 commits April 19, 2026 19:47

feat(shared): 导出 ContentRef 类型供 knowledge_runtime 使用

7575b11

feat(knowledge_runtime): 新增独立 RAG HTTP 服务

0216e1c

refactor(backend/rag): 重构 RAG 服务支持远程调用

6f8a37b

feat(knowledge_runtime): 添加 docker 部署支持和架构文档

254ad33

test(backend/rag): 修复 RAG 测试以匹配重构后的实现

92193e5

feat(knowledge_runtime): 添加部署配置并将服务加入 rag profile

58e573f

Merge remote-tracking branch 'wecode-ai/main' into zy_knowledge_runtime

f043c64

Merge remote-tracking branch 'wecode-ai/main' into knowledge_runtime

64c20d5

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread backend/app/api/endpoints/adapter/retrievers.py Outdated

fix: 修复多个服务的配置和错误处理问题

88f36d5

- backend: storage type 比较改为大小写不敏感 - knowledge_runtime: 修复 content fetcher 重试逻辑 - knowledge_runtime: start.sh 修复脚本工作目录 - start.sh: backend 启动添加 INTERNAL_SERVICE_TOKEN 导出

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

sunnights added 2 commits April 22, 2026 01:15

fix(test): use MagicMock instead of AsyncMock for asyncio.to_thread t…

6598317

…ests asyncio.to_thread expects sync functions, not coroutines. Using AsyncMock returns a coroutine object instead of the actual return value, causing TypeError: 'coroutine' object is not subscriptable errors.

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

		# Start uvicorn
		.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" --reload No newline at end of file


		## 架构概览

		```

Conversation

sunnights commented Apr 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

sunnights commented Apr 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 21, 2026 •

edited

Loading