refactor(backend): remove local RAG mode and use knowledge_runtime exclusively#1013
refactor(backend): remove local RAG mode and use knowledge_runtime exclusively#1013sunnights wants to merge 17 commits intowecode-ai:mainfrom
Conversation
- 添加认证中间件 verify_internal_token,支持常量时间比较 - 为 RAG 操作端点添加认证依赖(health 除外) - Backend RemoteRagGateway 添加 Authorization header 支持 - 添加 KNOWLEDGE_RUNTIME_TOKEN 配置项 - Docker Compose 添加环境变量配置(默认注释) - 添加 knowledge_runtime 和 Backend 认证测试
- 将 KNOWLEDGE_RUNTIME_TOKEN 重命名为 INTERNAL_SERVICE_TOKEN - Backend/knowledge_runtime/chat_shell 统一使用同一 token 进行服务间认证 - 更新 docker-compose.yml 和 .env.example 中的配置示例 - 更新相关测试用例
…clusively - Simplify gateway_factory.py to always return RemoteRagGateway singleton - Remove RAG_RUNTIME_MODE config and related validation logic - Delete LocalRagGateway and local_data_plane module - Remove fallback logic from API endpoints and orchestrator - Update all tests to use RemoteRagGateway mocks - Update README.md to reflect new architecture All RAG operations now route through knowledge_runtime service (:8200). Co-Authored-By: Claude Opus 4.6 <[email protected]>
Remove Backend's direct dependency on knowledge_engine module. All RAG operations now go through knowledge_runtime HTTP service via RemoteRagGateway. Changes: - Add storage-types API endpoint to knowledge_runtime for exposing storage type information (elasticsearch, qdrant, milvus) - Add get_storage_types() method to RemoteRagGateway - Modify retrievers.py to use RemoteRagGateway for storage type info - Refactor retrieval_service.py to use RemoteRagGateway.query() - Delete embedding module (helper function moved to runtime_resolver.py) - Remove knowledge_engine from pyproject.toml dependencies - Update tests to mock RemoteRagGateway instead of knowledge_engine Architecture: Backend -> RemoteRagGateway -> knowledge_runtime -> knowledge_engine Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Remove llama-index-* dependencies from backend and chat_shell - Remove pymilvus, docx2txt, and related packages - These dependencies are now only needed in knowledge_engine - Reduces installation size and improves architecture clarity
📝 WalkthroughWalkthroughMoves RAG execution out of the backend into a new FastAPI "knowledge_runtime" data-plane service, removes local data-plane code and fallback logic, makes the backend use a singleton RemoteRagGateway for all RAG operations, updates Docker/compose, tests, configs, and protocol models for remote-only execution. (39 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/Frontend
participant Backend as Backend (Control Plane)
participant RemoteGW as RemoteRagGateway
participant Runtime as knowledge_runtime (Data Plane)
participant KE as knowledge_engine
participant Storage as VectorStorage
Client->>Backend: POST /api/rag/retrieve
Backend->>RemoteGW: query(QueryRuntimeSpec)
RemoteGW->>Runtime: POST /internal/rag/query (RemoteQueryRequest)
Runtime->>Runtime: verify token, fetch content
Runtime->>KE: QueryExecutor.execute()
KE->>Storage: vector store ops
Storage-->>KE: results
KE-->>Runtime: RemoteQueryResponse
Runtime-->>RemoteGW: JSON response
RemoteGW-->>Backend: QueryResult
Backend-->>Client: API response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
AGENTS.md (1)
562-613:⚠️ Potential issue | 🟡 MinorUpdate the quick-reference ports list for Knowledge Runtime.
Line 562 introduces default port
8200, but the quick reference at Line 613 still omits it.📝 Proposed documentation update
-**Ports:** 3000 (frontend), 8000 (backend), 8001 (chat shell), 3306 (MySQL), 6379 (Redis) +**Ports:** 3000 (frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge runtime), 3306 (MySQL), 6379 (Redis)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 562 - 613, Update the Quick Reference ports list in AGENTS.md to include the Knowledge Runtime default port 8200; find the "Quick Reference" section (the code block showing start services/tests/formatting) and add "8200 (knowledge runtime)" to the Ports line so it reads something like "Ports: 3000 (frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge runtime), 3306 (MySQL), 6379 (Redis)".backend/app/services/knowledge/orchestrator.py (1)
2130-2134:⚠️ Potential issue | 🟡 MinorUpdate the stale fallback wording.
retrieve_knowledge()no longer supports local gateway execution or automatic fallback, but the docstring still says it does. Please update this to describe the remote-onlyknowledge_runtimepath.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/knowledge/orchestrator.py` around lines 2130 - 2134, The docstring for retrieve_knowledge() is stale: it claims local gateway execution and automatic fallback but the function now routes only via the remote-only knowledge_runtime path. Update the docstring in orchestrator.py to remove references to local gateway execution and automatic fallback, and instead state that retrieve_knowledge() is the unified entry point for MCP tools and the Open API that invokes the remote-only knowledge_runtime RAG gateway path (include mention of the knowledge_runtime flow and any relevant parameters it uses).backend/app/api/endpoints/internal/rag.py (1)
235-251:⚠️ Potential issue | 🔴 CriticalRoute
direct_injectionmode throughlist_chunks()instead ofquery().
_execute_query()unconditionally callsrag_gateway.query()for all route modes. However, when_finalize_query_runtime_spec()resolvesroute_modeto"direct_injection", the request should route throughrag_gateway.list_chunks()(as documented inbackend/app/services/rag/README.mdlines 73–74). Currently, direct_injection specs bypass the remote gateway's chunk retrieval path and instead hit the query endpoint withoutknowledge_base_configspopulated, violating the intended routing architecture.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag.py` around lines 235 - 251, The function _execute_query currently always calls rag_gateway.query(), but when runtime_spec.route_mode == "direct_injection" it must route to rag_gateway.list_chunks() instead; update _execute_query to first ensure knowledge_base_configs is populated the same way as the existing branch (use runtime_resolver.build_query_knowledge_base_configs with db, knowledge_base_ids, user_id, user_name on runtime_spec) and then, if getattr(runtime_spec, "route_mode", None) == "direct_injection", call and return await rag_gateway.list_chunks(runtime_spec, db=db), otherwise call and return await rag_gateway.query(runtime_spec, db=db); keep using get_query_gateway() and the same runtime_spec/model_copy flow.backend/tests/integration/test_rag_remote_mode.py (1)
21-21:⚠️ Potential issue | 🟡 MinorUnused import after fallback test removal.
RemoteRagGatewayErroris no longer referenced anywhere in this file (the priortest_internal_retrieve_falls_back_to_local_when_remote_query_fails_integrationwas the only user). Remove the import.🧹 Proposed cleanup
-from app.services.rag.remote_gateway import RemoteRagGatewayError from app.services.rag.runtime_specs import DirectInjectionBudget, QueryRuntimeSpecBased on learnings: "Delete dead code aggressively: ensure that deprecated, unused, or obsolete code is removed regardless of effort required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/integration/test_rag_remote_mode.py` at line 21, Remove the now-unused import RemoteRagGatewayError from the top of the test module (import statement "from app.services.rag.remote_gateway import RemoteRagGatewayError") since the fallback test that referenced it was removed; simply delete that import line to eliminate the dead/unused symbol and ensure no other references to RemoteRagGatewayError remain in test_rag_remote_mode.py.backend/app/api/endpoints/adapter/retrievers.py (1)
38-108:⚠️ Potential issue | 🟡 MinorMissing error handling for
RemoteRagGatewayErroron storage-types endpoints.Both
get_storage_retrieval_methods(lines 38–71) andget_storage_type_retrieval_methods(lines 74–108) now perform a remote HTTP call but letRemoteRagGatewayErrorpropagate as an unhandled 500. Compare withtest_retriever_connection(line 295), which catches it and returns a structured error. Ifknowledge_runtimeis down or misconfigured, the frontend will get a confusing 500 instead of a meaningful failure.Also, the case-insensitive comparison on line 98 only lowercases the user input (
storage_type.lower()) but assumestype_info.typeis already lowercase. If the remote service ever returns mixed-case names, the match silently fails. Consider.lower()on both sides.🛠️ Suggested fix
async def get_storage_retrieval_methods(): ... _check_rag_available() - gateway = RemoteRagGateway() - response = await gateway.get_storage_types() + try: + gateway = RemoteRagGateway() + response = await gateway.get_storage_types() + except RemoteRagGatewayError as e: + logger.error(f"Failed to fetch storage types: {e}") + raise HTTPException(status_code=502, detail=f"Knowledge Runtime unavailable: {e}") ...- if type_info.type == storage_type.lower(): + if type_info.type.lower() == storage_type.lower():🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 108, Both endpoints get_storage_retrieval_methods and get_storage_type_retrieval_methods call gateway.get_storage_types without handling RemoteRagGatewayError and do a one-sided lowercase comparison; wrap the await gateway.get_storage_types() call in a try/except that catches RemoteRagGatewayError (the same pattern used in test_retriever_connection), and on error raise an HTTPException with a clear status_code (e.g., 502 or 424) and the gateway error message; also make the storage type comparison robust by lowercasing both sides (compare type_info.type.lower() to storage_type.lower()) so mixed-case names from the remote service still match.
🧹 Nitpick comments (11)
knowledge_runtime/knowledge_runtime/core/logging.py (1)
33-43: Hoist theget_request_idimport out of the filter hot path.
RequestIdFilter.filterruns on every log record, so importingshared.telemetry.context.spanon each call — even withsys.modulescaching — adds unnecessary work and keeps deadImportErrorhandling for what is effectively a hard runtime dependency. Importing once at module scope makes the hot path a single function call and surfaces missing-dependency errors at startup rather than per-record.♻️ Proposed refactor
import logging import os import sys from logging.handlers import TimedRotatingFileHandler + +try: + from shared.telemetry.context.span import get_request_id +except ImportError: # pragma: no cover - shared is a required dep + def get_request_id(): # type: ignore[misc] + return None @@ def filter(self, record: logging.LogRecord) -> bool: ... - try: - from shared.telemetry.context.span import get_request_id - - request_id = get_request_id() - record.request_id = request_id if request_id else "-" - except ImportError: - # If telemetry module is not available, use placeholder - record.request_id = "-" - except Exception: - # Fallback for any other errors - record.request_id = "-" - + try: + request_id = get_request_id() + except Exception: # filters must never raise + request_id = None + record.request_id = request_id or "-" return True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/core/logging.py` around lines 33 - 43, Hoist the import of get_request_id out of RequestIdFilter.filter by importing it once at module scope: attempt "from shared.telemetry.context.span import get_request_id" at top of the module and handle ImportError there (e.g., set a local stub get_request_id = lambda: None or leave it None) so missing-dependency errors surface at startup; then simplify RequestIdFilter.filter to call get_request_id() (or check if callable) and set record.request_id = get_request_id() or "-" accordingly, removing the per-record try/except ImportError block.backend/app/services/rag/runtime_resolver.py (1)
35-60: Type hint drifts from the early-return behavior.
_process_custom_headers_placeholdersis annotated to returndict[str, Any], but the early return at line 51 passes throughcustom_headersunchanged when it's falsy or non-dict — possibly returningNone/list/etc. The sole current caller (line 687) already checksisinstance(custom_headers, dict)before invoking, so this branch is unreachable in practice, but a future reuse could silently break the contract.Tightening either the caller's contract or the helper's fallback is a cheap safety win.
♻️ Suggested tightening
def _process_custom_headers_placeholders( - custom_headers: dict[str, Any], user_name: str | None = None + custom_headers: Any, user_name: str | None = None ) -> dict[str, Any]: ... - if not custom_headers or not isinstance(custom_headers, dict): - return custom_headers + if not custom_headers or not isinstance(custom_headers, dict): + return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_resolver.py` around lines 35 - 60, The function _process_custom_headers_placeholders currently declares return type dict[str, Any] but returns the raw custom_headers on early-exit, which can violate the contract; fix by normalizing inputs: if not custom_headers (None/empty/False) return an empty dict{}; if custom_headers is present but not a dict, raise a TypeError to fail fast; then proceed to build data_sources and call build_default_headers_with_placeholders as before. This keeps the return type stable and surfaces invalid inputs immediately.knowledge_runtime/knowledge_runtime/services/admin_executor.py (1)
13-13: Unused import:DocumentService.
DocumentServiceis imported but never referenced in this module. Please remove it to avoid drifting dependencies and accidental coupling toknowledge_engine.services.document_service.-from knowledge_engine.services.document_service import DocumentService -from knowledge_engine.storage.factory import ( +from knowledge_engine.storage.factory import ( create_storage_backend_from_runtime_config, get_all_storage_retrieval_methods, get_supported_storage_types, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/admin_executor.py` at line 13, Remove the unused import DocumentService from the import list in admin_executor.py: delete the line "from knowledge_engine.services.document_service import DocumentService" and ensure no other references to DocumentService remain in the module (or replace any accidental usage with the correct service). After removal, run linters/tests to confirm no import-related errors.knowledge_runtime/tests/test_logging.py (1)
66-206: Add a fixture to save/restore root logger state and close file handlers.Every test mutates the global root logger (and
uvicorn.access) viahandlers.clear()+setup_logging(...), but none of them close the handlers or restore prior state when finished. This has two consequences:
- File handlers opened by
setup_logging(log_file_enabled=True, log_dir=tmpdir, ...)remain attached to the root logger after the test returns. On Windows (and on any platform where the test order causes a later write) the open file descriptor will blockTemporaryDirectorycleanup, and log records emitted by subsequent tests can still be written into files under an already-removed tmpdir.- The last test's handlers leak into the rest of the pytest session, polluting unrelated tests in the package.
Consider a
autousefixture that snapshots/clears handlers before each test and closes/restores them after:♻️ Proposed fixture
+@pytest.fixture(autouse=True) +def _reset_logging(): + loggers = [logging.getLogger(), logging.getLogger("uvicorn.access")] + saved = [(lg, list(lg.handlers), lg.level, lg.propagate) for lg in loggers] + for lg in loggers: + lg.handlers.clear() + try: + yield + finally: + for lg, handlers, level, propagate in saved: + for h in lg.handlers: + try: + h.close() + except Exception: + pass + lg.handlers = handlers + lg.setLevel(level) + lg.propagate = propagate🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/tests/test_logging.py` around lines 66 - 206, Tests mutate the global root logger and leak file handlers; add an autouse pytest fixture in the test module that snapshots the root logger and any relevant child loggers (e.g., "uvicorn.access", "uvicorn", "fastapi", "httpx", "httpcore") before each test, clears handlers and sets desired state, and after the test closes any file/stream handlers, restores the original handlers, levels and propagate flags so setup_logging (and tests like test_setup_logging_file_mode, test_setup_logging_access_log_separate_file, test_setup_logging_access_log_console_mode) cannot leak open file descriptors or mutated logger state across tests.backend/app/services/rag/remote_gateway.py (1)
73-119: Duplicated header construction between_post_modeland_get.The
Authorizationheader building is repeated verbatim. Extract a small helper to keep a single source of truth.♻️ Proposed refactor
+ def _build_headers(self) -> dict[str, str]: + headers: dict[str, str] = {} + if self._auth_token: + headers["Authorization"] = f"Bearer {self._auth_token}" + return headers + async def _post_model(self, path: str, payload: Any) -> dict[str, Any]: - headers = {} - if self._auth_token: - headers["Authorization"] = f"Bearer {self._auth_token}" - try: async with httpx.AsyncClient(timeout=self._timeout) as client: response = await client.post( f"{self._base_url}{path}", json=payload.model_dump(mode="json", exclude_none=True), - headers=headers, + headers=self._build_headers(), ) ... async def _get(self, path: str) -> dict[str, Any]: """Make a GET request to knowledge_runtime.""" - headers = {} - if self._auth_token: - headers["Authorization"] = f"Bearer {self._auth_token}" - try: async with httpx.AsyncClient(timeout=self._timeout) as client: response = await client.get( f"{self._base_url}{path}", - headers=headers, + headers=self._build_headers(), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/remote_gateway.py` around lines 73 - 119, Both _post_model and _get duplicate the Authorization header construction; create a small helper method (e.g., _build_headers or _auth_headers) that returns the headers dict using self._auth_token and reuse it from _post_model and _get (replace the inline header logic with a call to that helper); ensure the helper lives on the same class as _post_model/_get and is used everywhere this pattern appears so header construction is a single source of truth.backend/app/services/rag/retrieval_service.py (1)
729-740: Redundant local import and duplicateRagRuntimeResolverinstantiation.
RagRuntimeResolveris already imported at module level (line 19) andself.runtime_resolveris initialized in__init__(line 41). Creating a new instance here is wasteful and inconsistent with the rest of the class. Similarly,get_list_chunks_gatewayshould be hoisted to the module-level imports next toget_query_gateway(line 18).♻️ Proposed refactor
-from app.services.rag.gateway_factory import get_query_gateway +from app.services.rag.gateway_factory import ( + get_list_chunks_gateway, + get_query_gateway, +)- from app.services.rag.gateway_factory import get_list_chunks_gateway - from app.services.rag.runtime_resolver import RagRuntimeResolver - # Build runtime spec via resolver - runtime_resolver = RagRuntimeResolver() - spec = runtime_resolver.build_internal_list_chunks_runtime_spec( + spec = self.runtime_resolver.build_internal_list_chunks_runtime_spec( db=db, knowledge_base_id=knowledge_base_id, max_chunks=max_chunks, query=query, metadata_condition=metadata_condition, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/retrieval_service.py` around lines 729 - 740, The code redundantly imports get_list_chunks_gateway locally and creates a new RagRuntimeResolver instead of using the module-level imports and the instance on the service; remove the local import and the local instantiation and use the existing module-level get_list_chunks_gateway and the instance attribute self.runtime_resolver (already created in __init__) when calling build_internal_list_chunks_runtime_spec and when obtaining the gateway (consistent with get_query_gateway usage).knowledge_runtime/knowledge_runtime/main.py (1)
24-25: Missing return type annotation onlifespan.Per coding guidelines, Python functions require type hints.
lifespanreturns an async context yielding nothing.💡 Proposed annotation
+from collections.abc import AsyncIterator + `@asynccontextmanager` -async def lifespan(app: FastAPI): +async def lifespan(app: FastAPI) -> AsyncIterator[None]: """Application lifespan handler for startup and shutdown events."""As per coding guidelines: "Follow PEP 8, use Black formatter (line length: 88), isort for import organization, and require type hints".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/main.py` around lines 24 - 25, The function lifespan is missing a return type hint; annotate it as returning an asynchronous generator yielding nothing (e.g., add "-> AsyncGenerator[None, None]" to the lifespan signature) and import the AsyncGenerator type from typing (or typing_extensions if required), so the asynccontextmanager-decorated function has a proper type hint; update the lifespan definition and add the needed import while keeping the function behavior unchanged.knowledge_runtime/start.sh (1)
244-248: Dead exit-code check underset -e.With
set -eat line 10, a non-zero exit fromuv syncalready aborts the script, so theif [ $? -ne 0 ]branch is unreachable. Either drop the check or invert it withif ! uv sync;for explicit handling (shellcheck SC2181).🛠️ Suggested fix
-# Sync dependencies -uv sync -if [ $? -ne 0 ]; then - echo -e "${RED}Error: Failed to install dependencies${NC}" - exit 1 -fi +# Sync dependencies +if ! uv sync; then + echo -e "${RED}Error: Failed to install dependencies${NC}" + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 244 - 248, The post-command exit-code check after running "uv sync" is dead because "set -e" causes the script to exit on any non-zero status; either remove the redundant "if [ $? -ne 0 ]" block entirely or change the construct to explicitly handle failure by running "if ! uv sync; then" and keeping the echo/exit handling; update the block surrounding the "uv sync" invocation (and any related echo/exit lines) accordingly so there is no unreachable code under the existing "set -e" behavior.docker-compose.yml (1)
263-299: Nit:knowledge_runtimedepends onbackendfor content fetch but the backend has no healthcheck.
depends_on: backend(lines 288–289) only waits for the container to start, not for it to be ready. Sinceknowledge_runtime'sContentFetcherneeds the backend forBackendAttachmentStreamContentReffetches, consider adding a healthcheck on thebackendservice and usingcondition: service_healthyhere — otherwise early content fetches may fail on cold-start. Non-blocking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.yml` around lines 263 - 299, Add a proper healthcheck to the backend service and make knowledge_runtime wait for that health status before starting: define a backend healthcheck that verifies the backend readiness endpoint (e.g., internal health route used by ContentFetcher/BackendAttachmentStreamContentRef) and update knowledge_runtime's depends_on to require the backend's service_healthy condition so the ContentFetcher won't attempt early fetches on cold-start; reference the services by name (backend, knowledge_runtime) and the components ContentFetcher and BackendAttachmentStreamContentRef when making the changes.knowledge_runtime/knowledge_runtime/services/content_fetcher.py (1)
213-224:Content-Dispositionparsing misses RFC 5987 and multi-param headers.The current parser will mis-handle common real-world headers like:
attachment; filename*=UTF-8''na%C3%AFve.pdf(RFC 5987 encoded form — ignored entirely).attachment; filename="a.pdf"; size=123— the split-on-filename=+ strip-quotes leavesa.pdf"; size=123on the wrong boundary.Consider using
email.message.Messageorwerkzeug/rfc6266-style parsing, or at minimum split the tail on;and handlefilename*=:- if "filename=" in content_disposition: - parts = content_disposition.split("filename=") - if len(parts) > 1: - filename = parts[1].strip('"').strip("'") + if "filename" in content_disposition: + from email.message import Message + msg = Message() + msg["content-disposition"] = content_disposition + filename = msg.get_filename() or "" + if filename:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 213 - 224, The Content-Disposition parsing in content_fetcher.py (the block that inspects response.headers.get("Content-Disposition")) fails for RFC5987 and multi-param headers; update the logic in that parsing block (or the surrounding function that extracts filename/extension) to robustly parse parameters by splitting the header into params on ";" and handling both filename and filename* keys (use RFC5987 decoding via urllib.parse.unquote and charset/lang stripping), and ensure you stop at param boundaries (trim trailing params like size=) before extracting the filename and extension; alternatively replace the custom parse with a standard parser such as email.message.Message.get_param / cgi.parse_header or an rfc6266 helper to correctly handle quoted filenames and encoded filename* values.knowledge_runtime/knowledge_runtime/services/query_executor.py (1)
56-56: Sort key treats score0(and negatives) as missing.
r.score or 0conflatesNone,0.0, and any falsy/negative score. If a backend ever returns0.0or signed distances, the sort order becomes inconsistent with real ranking. Prefer an explicit None check:- all_records.sort(key=lambda r: r.score or 0, reverse=True) + all_records.sort( + key=lambda r: r.score if r.score is not None else float("-inf"), + reverse=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/query_executor.py` at line 56, The sort key in query_executor.py currently treats falsy scores (like 0.0 or negative values) as missing; update the lambda used in the all_records.sort call inside the relevant query execution logic to explicitly check for None and only substitute a default when score is None (i.e., return the record's score when it is not None, otherwise return the fallback value), so zero and negative scores are preserved in ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 552-561: Add the missing GET /internal/rag/storage-types endpoint
to the Endpoints list in AGENTS.md so the quick reference matches the actual
routes; update the endpoints block (the list that currently contains GET
/internal/rag/health, POST /internal/rag/index, etc.) to include `GET
/internal/rag/storage-types` and, if desired, a short description like "List
available storage types" to mirror the route implemented in
knowledge_runtime/knowledge_runtime/api/endpoints/admin.py.
In `@backend/tests/services/rag/test_remote_gateway.py`:
- Around line 717-758: The test test_gateway_no_auth_header_when_token_empty is
brittle because it assumes settings.INTERNAL_SERVICE_TOKEN is empty; before
creating RemoteRagGateway set settings.INTERNAL_SERVICE_TOKEN to "" (use
monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "")) to isolate the test
and ensure RemoteRagGateway(auth_token="") does not fall back to the settings
token—mirror the approach used in
test_gateway_uses_settings_token_when_not_provided to guarantee the headers
assertion remains deterministic.
In `@docker/knowledge_runtime/Dockerfile`:
- Around line 31-42: Add an unprivileged user and switch to it before the
container starts: create a user/group (e.g., appuser), ensure /app and /app/logs
are owned and writable by that user (chown/chmod as needed), then add a USER
appuser line prior to CMD so uvicorn runs unprivileged; keep the existing
HEALTHCHECK and EXPOSE lines unchanged but ensure the health endpoint is
reachable by the non-root user.
In `@docs/plans/2026-04-19-knowledge-runtime-architecture.md`:
- Line 34: The fenced code blocks in the document containing ASCII diagrams and
flow charts are missing language identifiers (causing markdownlint MD040);
update each fenced block that contains ASCII art or flow charts to include an
appropriate language tag such as "text" or a more specific identifier (e.g.,
"graphviz" or "mermaid" if applicable) so the blocks read as ```text (or
```graphviz/```mermaid) and satisfy MD040 while preserving the diagrams'
rendering.
In `@knowledge_runtime/.env.example`:
- Around line 13-16: The dotenv keys are out of the expected lint order; move
LOG_DIR to appear before LOG_FILE_ENABLED in the .env example so keys read
LOG_DIR, LOG_FILE_ENABLED, LOG_LEVEL (update the block containing
LOG_FILE_ENABLED, LOG_DIR, LOG_LEVEL accordingly) to satisfy dotenv-linter's
expected ordering for those variables.
In `@knowledge_runtime/knowledge_runtime/config.py`:
- Around line 36-39: The current default internal_service_token = "" leaves
internal endpoints unprotected; change the config so an internal_service_token
is required at startup: update the settings model by making
internal_service_token a required field (e.g., internal_service_token: str or
internal_service_token: str = Field(...)) in the same config class so the app
fails to start if ENV INTERNAL_SERVICE_TOKEN is not set; ensure any code that
reads internal_service_token (the auth dependency that checks tokens) keeps
using that field unchanged.
In `@knowledge_runtime/knowledge_runtime/main.py`:
- Around line 51-69: The global_exception_handler currently exposes raw
exception messages via RemoteRagError (message=str(exc) and
details={"exception_type": ...}); change it to return a generic message (e.g.,
"Internal server error") and avoid sending exception_type or raw details to
callers by default; keep the full exception in server logs via logger.exception,
and only include detailed exception info in the RemoteRagError.details when
running a non-production mode (check an environment/config flag), referencing
the global_exception_handler and RemoteRagError (and be mindful that
RemoteRagGatewayError consumers will forward message verbatim).
In `@knowledge_runtime/knowledge_runtime/middleware/auth.py`:
- Around line 38-40: The current check that returns when expected_token is falsy
(variable expected_token) creates a fail-open auth hole if
INTERNAL_SERVICE_TOKEN is missing; change this to fail closed: if expected_token
is missing, respond with a 401/403 and an error log instead of returning. If you
want a dev opt-out, add an explicit env flag (e.g., DEV_AUTH_DISABLED or
ALLOW_DEV_UNAUTH) and only bypass authentication when that flag is truthy and
explicitly set, otherwise require INTERNAL_SERVICE_TOKEN; update the middleware
(the auth function referencing expected_token) to enforce this and log a clear
warning when dev opt-out is used.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 56-175: The retry on fetch() currently only retries
httpx.TransportError but both _fetch_from_presigned_url and
_fetch_from_backend_stream catch TransportError and raise ContentFetchError, so
tenacity never retries; update fetch() to retry based on
ContentFetchError.retryable by replacing the decorator’s retry argument with a
predicate (e.g., use tenacity.retry_if_exception with a helper function
_is_retryable that returns isinstance(exc, ContentFetchError) and exc.retryable)
so fetch() will retry when inner methods raise
ContentFetchError(retryable=True); keep the inner handlers raising
ContentFetchError as they are and only change the `@retry`(...) on fetch() to
reference the new predicate.
In `@knowledge_runtime/pyproject.toml`:
- Line 9: Update the requires-python spec in pyproject.toml to allow all Python
3.13 patch releases by replacing the current constraint requires-python =
">=3.10,<=3.13" with a range using an exclusive upper bound, e.g.
requires-python = ">=3.10,<3.14", so the requires-python field correctly
includes Python 3.13.x versions.
In `@knowledge_runtime/start.sh`:
- Around line 232-290: The script assumes it's run from the knowledge_runtime
directory; update start.sh so all workspace-relative operations (the
pyproject.toml existence check, creating/using .venv, copying .env.example,
computing PROJECT_ROOT, and invoking .venv/bin/python) run relative to the
script's directory: determine the script directory (using the equivalent of
BASH_SOURCE/dirname resolution), cd into that directory at the top of start.sh
before the pyproject.toml check, and ensure PROJECT_ROOT is computed relative to
that script directory (so the later export PYTHONPATH and .venv invocation
reference the correct paths).
In `@knowledge_runtime/tests/conftest.py`:
- Around line 7-31: Add explicit return type hints and move the MagicMock import
to the module top: import MagicMock from unittest.mock once at the top of the
file, then annotate the two fixtures mock_storage_backend and mock_embed_model
with return types (e.g., -> MagicMock) so the fixtures are typed; keep fixture
bodies identical but remove the inline imports and ensure isort/Black formatting
rules are followed.
In `@start.sh`:
- Around line 1586-1595: The knowledge_runtime start command uses export
INTERNAL_SERVICE_TOKEN=\$INTERNAL_SERVICE_TOKEN which can mismatch the backend
because the backend start does not export the same variable; update the backend
start invocation (the other start_service call that launches "backend") to also
export INTERNAL_SERVICE_TOKEN from the top-level environment so both services
receive the identical token, and ensure the knowledge_runtime start_service call
references the same unescaped VARIABLE (so it reads the shell/exported value)
rather than relying on its own .env; locate the start_service calls for
"knowledge_runtime" and "backend" and make the backend start command include
export INTERNAL_SERVICE_TOKEN=\$INTERNAL_SERVICE_TOKEN (or centralize the token
read at the top-level load_config and export it once before any start_service
calls).
---
Outside diff comments:
In `@AGENTS.md`:
- Around line 562-613: Update the Quick Reference ports list in AGENTS.md to
include the Knowledge Runtime default port 8200; find the "Quick Reference"
section (the code block showing start services/tests/formatting) and add "8200
(knowledge runtime)" to the Ports line so it reads something like "Ports: 3000
(frontend), 8000 (backend), 8001 (chat shell), 8200 (knowledge runtime), 3306
(MySQL), 6379 (Redis)".
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-108: Both endpoints get_storage_retrieval_methods and
get_storage_type_retrieval_methods call gateway.get_storage_types without
handling RemoteRagGatewayError and do a one-sided lowercase comparison; wrap the
await gateway.get_storage_types() call in a try/except that catches
RemoteRagGatewayError (the same pattern used in test_retriever_connection), and
on error raise an HTTPException with a clear status_code (e.g., 502 or 424) and
the gateway error message; also make the storage type comparison robust by
lowercasing both sides (compare type_info.type.lower() to storage_type.lower())
so mixed-case names from the remote service still match.
In `@backend/app/api/endpoints/internal/rag.py`:
- Around line 235-251: The function _execute_query currently always calls
rag_gateway.query(), but when runtime_spec.route_mode == "direct_injection" it
must route to rag_gateway.list_chunks() instead; update _execute_query to first
ensure knowledge_base_configs is populated the same way as the existing branch
(use runtime_resolver.build_query_knowledge_base_configs with db,
knowledge_base_ids, user_id, user_name on runtime_spec) and then, if
getattr(runtime_spec, "route_mode", None) == "direct_injection", call and return
await rag_gateway.list_chunks(runtime_spec, db=db), otherwise call and return
await rag_gateway.query(runtime_spec, db=db); keep using get_query_gateway() and
the same runtime_spec/model_copy flow.
In `@backend/app/services/knowledge/orchestrator.py`:
- Around line 2130-2134: The docstring for retrieve_knowledge() is stale: it
claims local gateway execution and automatic fallback but the function now
routes only via the remote-only knowledge_runtime path. Update the docstring in
orchestrator.py to remove references to local gateway execution and automatic
fallback, and instead state that retrieve_knowledge() is the unified entry point
for MCP tools and the Open API that invokes the remote-only knowledge_runtime
RAG gateway path (include mention of the knowledge_runtime flow and any relevant
parameters it uses).
In `@backend/tests/integration/test_rag_remote_mode.py`:
- Line 21: Remove the now-unused import RemoteRagGatewayError from the top of
the test module (import statement "from app.services.rag.remote_gateway import
RemoteRagGatewayError") since the fallback test that referenced it was removed;
simply delete that import line to eliminate the dead/unused symbol and ensure no
other references to RemoteRagGatewayError remain in test_rag_remote_mode.py.
---
Nitpick comments:
In `@backend/app/services/rag/remote_gateway.py`:
- Around line 73-119: Both _post_model and _get duplicate the Authorization
header construction; create a small helper method (e.g., _build_headers or
_auth_headers) that returns the headers dict using self._auth_token and reuse it
from _post_model and _get (replace the inline header logic with a call to that
helper); ensure the helper lives on the same class as _post_model/_get and is
used everywhere this pattern appears so header construction is a single source
of truth.
In `@backend/app/services/rag/retrieval_service.py`:
- Around line 729-740: The code redundantly imports get_list_chunks_gateway
locally and creates a new RagRuntimeResolver instead of using the module-level
imports and the instance on the service; remove the local import and the local
instantiation and use the existing module-level get_list_chunks_gateway and the
instance attribute self.runtime_resolver (already created in __init__) when
calling build_internal_list_chunks_runtime_spec and when obtaining the gateway
(consistent with get_query_gateway usage).
In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 35-60: The function _process_custom_headers_placeholders currently
declares return type dict[str, Any] but returns the raw custom_headers on
early-exit, which can violate the contract; fix by normalizing inputs: if not
custom_headers (None/empty/False) return an empty dict{}; if custom_headers is
present but not a dict, raise a TypeError to fail fast; then proceed to build
data_sources and call build_default_headers_with_placeholders as before. This
keeps the return type stable and surfaces invalid inputs immediately.
In `@docker-compose.yml`:
- Around line 263-299: Add a proper healthcheck to the backend service and make
knowledge_runtime wait for that health status before starting: define a backend
healthcheck that verifies the backend readiness endpoint (e.g., internal health
route used by ContentFetcher/BackendAttachmentStreamContentRef) and update
knowledge_runtime's depends_on to require the backend's service_healthy
condition so the ContentFetcher won't attempt early fetches on cold-start;
reference the services by name (backend, knowledge_runtime) and the components
ContentFetcher and BackendAttachmentStreamContentRef when making the changes.
In `@knowledge_runtime/knowledge_runtime/core/logging.py`:
- Around line 33-43: Hoist the import of get_request_id out of
RequestIdFilter.filter by importing it once at module scope: attempt "from
shared.telemetry.context.span import get_request_id" at top of the module and
handle ImportError there (e.g., set a local stub get_request_id = lambda: None
or leave it None) so missing-dependency errors surface at startup; then simplify
RequestIdFilter.filter to call get_request_id() (or check if callable) and set
record.request_id = get_request_id() or "-" accordingly, removing the per-record
try/except ImportError block.
In `@knowledge_runtime/knowledge_runtime/main.py`:
- Around line 24-25: The function lifespan is missing a return type hint;
annotate it as returning an asynchronous generator yielding nothing (e.g., add
"-> AsyncGenerator[None, None]" to the lifespan signature) and import the
AsyncGenerator type from typing (or typing_extensions if required), so the
asynccontextmanager-decorated function has a proper type hint; update the
lifespan definition and add the needed import while keeping the function
behavior unchanged.
In `@knowledge_runtime/knowledge_runtime/services/admin_executor.py`:
- Line 13: Remove the unused import DocumentService from the import list in
admin_executor.py: delete the line "from
knowledge_engine.services.document_service import DocumentService" and ensure no
other references to DocumentService remain in the module (or replace any
accidental usage with the correct service). After removal, run linters/tests to
confirm no import-related errors.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 213-224: The Content-Disposition parsing in content_fetcher.py
(the block that inspects response.headers.get("Content-Disposition")) fails for
RFC5987 and multi-param headers; update the logic in that parsing block (or the
surrounding function that extracts filename/extension) to robustly parse
parameters by splitting the header into params on ";" and handling both filename
and filename* keys (use RFC5987 decoding via urllib.parse.unquote and
charset/lang stripping), and ensure you stop at param boundaries (trim trailing
params like size=) before extracting the filename and extension; alternatively
replace the custom parse with a standard parser such as
email.message.Message.get_param / cgi.parse_header or an rfc6266 helper to
correctly handle quoted filenames and encoded filename* values.
In `@knowledge_runtime/knowledge_runtime/services/query_executor.py`:
- Line 56: The sort key in query_executor.py currently treats falsy scores (like
0.0 or negative values) as missing; update the lambda used in the
all_records.sort call inside the relevant query execution logic to explicitly
check for None and only substitute a default when score is None (i.e., return
the record's score when it is not None, otherwise return the fallback value), so
zero and negative scores are preserved in ordering.
In `@knowledge_runtime/start.sh`:
- Around line 244-248: The post-command exit-code check after running "uv sync"
is dead because "set -e" causes the script to exit on any non-zero status;
either remove the redundant "if [ $? -ne 0 ]" block entirely or change the
construct to explicitly handle failure by running "if ! uv sync; then" and
keeping the echo/exit handling; update the block surrounding the "uv sync"
invocation (and any related echo/exit lines) accordingly so there is no
unreachable code under the existing "set -e" behavior.
In `@knowledge_runtime/tests/test_logging.py`:
- Around line 66-206: Tests mutate the global root logger and leak file
handlers; add an autouse pytest fixture in the test module that snapshots the
root logger and any relevant child loggers (e.g., "uvicorn.access", "uvicorn",
"fastapi", "httpx", "httpcore") before each test, clears handlers and sets
desired state, and after the test closes any file/stream handlers, restores the
original handlers, levels and propagate flags so setup_logging (and tests like
test_setup_logging_file_mode, test_setup_logging_access_log_separate_file,
test_setup_logging_access_log_console_mode) cannot leak open file descriptors or
mutated logger state across tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0bae367c-4a22-49cc-b318-978a281e5023
⛔ Files ignored due to path filters (3)
backend/uv.lockis excluded by!**/*.lockchat_shell/uv.lockis excluded by!**/*.lockknowledge_runtime/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (68)
.env.exampleAGENTS.mdbackend/.env.examplebackend/app/api/endpoints/adapter/retrievers.pybackend/app/api/endpoints/internal/rag.pybackend/app/api/endpoints/rag.pybackend/app/core/config.pybackend/app/services/knowledge/orchestrator.pybackend/app/services/rag/README.mdbackend/app/services/rag/embedding/__init__.pybackend/app/services/rag/embedding/factory.pybackend/app/services/rag/gateway_factory.pybackend/app/services/rag/local_data_plane/__init__.pybackend/app/services/rag/local_data_plane/administration.pybackend/app/services/rag/local_data_plane/indexing.pybackend/app/services/rag/local_data_plane/retrieval.pybackend/app/services/rag/local_gateway.pybackend/app/services/rag/remote_gateway.pybackend/app/services/rag/retrieval_service.pybackend/app/services/rag/runtime_resolver.pybackend/pyproject.tomlbackend/tests/api/endpoints/internal/test_rag_retrieve_endpoint.pybackend/tests/api/endpoints/test_rag_api.pybackend/tests/core/test_config.pybackend/tests/integration/test_rag_remote_mode.pybackend/tests/services/rag/test_local_data_plane_indexing.pybackend/tests/services/rag/test_local_data_plane_retrieval.pybackend/tests/services/rag/test_local_gateway.pybackend/tests/services/rag/test_remote_gateway.pybackend/tests/services/rag/test_retrieval_service.pychat_shell/pyproject.tomldocker-compose.build.ymldocker-compose.ymldocker/knowledge_runtime/Dockerfiledocs/plans/2026-04-19-knowledge-runtime-architecture.mdknowledge_runtime/.env.exampleknowledge_runtime/knowledge_runtime/__init__.pyknowledge_runtime/knowledge_runtime/api/__init__.pyknowledge_runtime/knowledge_runtime/api/endpoints/__init__.pyknowledge_runtime/knowledge_runtime/api/endpoints/admin.pyknowledge_runtime/knowledge_runtime/api/endpoints/health.pyknowledge_runtime/knowledge_runtime/api/endpoints/index.pyknowledge_runtime/knowledge_runtime/api/endpoints/query.pyknowledge_runtime/knowledge_runtime/api/router.pyknowledge_runtime/knowledge_runtime/config.pyknowledge_runtime/knowledge_runtime/core/__init__.pyknowledge_runtime/knowledge_runtime/core/logging.pyknowledge_runtime/knowledge_runtime/main.pyknowledge_runtime/knowledge_runtime/middleware/__init__.pyknowledge_runtime/knowledge_runtime/middleware/auth.pyknowledge_runtime/knowledge_runtime/services/__init__.pyknowledge_runtime/knowledge_runtime/services/admin_executor.pyknowledge_runtime/knowledge_runtime/services/content_fetcher.pyknowledge_runtime/knowledge_runtime/services/index_executor.pyknowledge_runtime/knowledge_runtime/services/query_executor.pyknowledge_runtime/pyproject.tomlknowledge_runtime/start.shknowledge_runtime/tests/__init__.pyknowledge_runtime/tests/conftest.pyknowledge_runtime/tests/test_admin_executor.pyknowledge_runtime/tests/test_auth.pyknowledge_runtime/tests/test_content_fetcher.pyknowledge_runtime/tests/test_index_executor.pyknowledge_runtime/tests/test_logging.pyknowledge_runtime/tests/test_query_executor.pyshared/models/__init__.pyshared/models/knowledge_runtime_protocol.pystart.sh
💤 Files with no reviewable changes (12)
- backend/app/services/rag/local_data_plane/init.py
- backend/tests/core/test_config.py
- chat_shell/pyproject.toml
- backend/app/services/rag/embedding/init.py
- backend/tests/services/rag/test_local_data_plane_retrieval.py
- backend/tests/services/rag/test_local_gateway.py
- backend/app/services/rag/local_data_plane/administration.py
- backend/app/services/rag/local_data_plane/retrieval.py
- backend/tests/services/rag/test_local_data_plane_indexing.py
- backend/app/services/rag/local_data_plane/indexing.py
- backend/app/services/rag/local_gateway.py
- backend/app/services/rag/embedding/factory.py
| @pytest.mark.asyncio | ||
| async def test_gateway_no_auth_header_when_token_empty(mocker) -> None: | ||
| """Test that gateway does not add Authorization header when token is empty.""" | ||
| post_mock = mocker.patch( | ||
| "httpx.AsyncClient.post", | ||
| return_value=_build_response( | ||
| url="http://knowledge-runtime/internal/rag/query", | ||
| status_code=200, | ||
| json_body={"records": [], "total": 0, "total_estimated_tokens": 0}, | ||
| ), | ||
| ) | ||
| gateway = RemoteRagGateway( | ||
| base_url="http://knowledge-runtime", | ||
| auth_token="", | ||
| ) | ||
| spec = QueryRuntimeSpec( | ||
| knowledge_base_ids=[1], | ||
| query="test", | ||
| knowledge_base_configs=[ | ||
| QueryKnowledgeBaseRuntimeConfig( | ||
| knowledge_base_id=1, | ||
| index_owner_user_id=1, | ||
| retriever_config=RuntimeRetrieverConfig( | ||
| name="test", | ||
| namespace="default", | ||
| storage_config={"type": "elasticsearch", "url": "http://es:9200"}, | ||
| ), | ||
| embedding_model_config=RuntimeEmbeddingModelConfig( | ||
| model_name="test", | ||
| model_namespace="default", | ||
| resolved_config={"protocol": "openai", "model_id": "test-model"}, | ||
| ), | ||
| retrieval_config=RuntimeRetrievalConfig(), | ||
| ) | ||
| ], | ||
| ) | ||
|
|
||
| assert isinstance(get_index_gateway(), LocalRagGateway) | ||
| assert isinstance(get_query_gateway(), RemoteRagGateway) | ||
| assert isinstance(get_delete_gateway(), RemoteRagGateway) | ||
| await gateway.query(spec) | ||
|
|
||
| args, kwargs = post_mock.await_args | ||
| # When token is empty, headers should be empty dict | ||
| assert kwargs.get("headers") == {} |
There was a problem hiding this comment.
Potential brittleness: empty-string auth_token test relies on fallback to settings.INTERNAL_SERVICE_TOKEN being empty.
test_gateway_no_auth_header_when_token_empty does not reset settings.INTERNAL_SERVICE_TOKEN. If a developer has a non-empty value configured (or another test leaks state without teardown), the assertion kwargs.get("headers") == {} will fail depending on whether the gateway treats auth_token="" as "explicitly no auth" or falls back to the settings token. Consider using monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "") here too to isolate the test, mirroring test_gateway_uses_settings_token_when_not_provided.
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 756-756: Unpacked variable args is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/services/rag/test_remote_gateway.py` around lines 717 - 758,
The test test_gateway_no_auth_header_when_token_empty is brittle because it
assumes settings.INTERNAL_SERVICE_TOKEN is empty; before creating
RemoteRagGateway set settings.INTERNAL_SERVICE_TOKEN to "" (use
monkeypatch.setattr(settings, "INTERNAL_SERVICE_TOKEN", "")) to isolate the test
and ensure RemoteRagGateway(auth_token="") does not fall back to the settings
token—mirror the approach used in
test_gateway_uses_settings_token_when_not_provided to guarantee the headers
assertion remains deterministic.
| # Create log directory | ||
| RUN mkdir -p /app/logs | ||
|
|
||
| # Expose port | ||
| EXPOSE ${PORT} | ||
|
|
||
| # Health check | ||
| HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ | ||
| CMD curl -f http://localhost:${PORT}/internal/rag/health || exit 1 | ||
|
|
||
| # Start command | ||
| CMD ["sh", "-c", "exec uvicorn knowledge_runtime.main:app --host ${HOST} --port ${PORT}"] No newline at end of file |
There was a problem hiding this comment.
Run the runtime container as a non-root user.
The service currently starts as root. Add an unprivileged user and ensure /app/logs remains writable before CMD.
Container hardening example
# Create log directory
-RUN mkdir -p /app/logs
+RUN mkdir -p /app/logs && \
+ groupadd --system app && \
+ useradd --system --gid app --home-dir /app --shell /usr/sbin/nologin app && \
+ chown -R app:app /app
# Expose port
EXPOSE ${PORT}
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:${PORT}/internal/rag/health || exit 1
+USER app
+
# Start command
CMD ["sh", "-c", "exec uvicorn knowledge_runtime.main:app --host ${HOST} --port ${PORT}"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/knowledge_runtime/Dockerfile` around lines 31 - 42, Add an
unprivileged user and switch to it before the container starts: create a
user/group (e.g., appuser), ensure /app and /app/logs are owned and writable by
that user (chown/chmod as needed), then add a USER appuser line prior to CMD so
uvicorn runs unprivileged; keep the existing HEALTHCHECK and EXPOSE lines
unchanged but ensure the health endpoint is reachable by the non-root user.
| - **服务解耦**:创建独立的 `knowledge_runtime` 服务,Backend 通过 HTTP 调用 | ||
| - **独立扩展**:RAG 计算层可独立水平扩展 | ||
| - **向后兼容**:保留 `LocalRagGateway` 作为 fallback,支持灰度切换 | ||
| - **协议透明**:使用已有的 `knowledge_runtime_protocol.py` 通信模型 | ||
|
|
||
| --- | ||
|
|
||
| ## 架构概览 | ||
|
|
||
| ``` | ||
| ┌─────────────────────────────────────────────────────────────────────────┐ | ||
| │ Backend (控制面) │ | ||
| │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ | ||
| │ │ HTTP API │ │ MCP Tools │ │ Internal API│ │ | ||
| │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ | ||
| │ │ │ │ │ | ||
| │ └──────────────────┼──────────────────┘ │ | ||
| │ ▼ │ | ||
| │ ┌─────────────────┐ │ | ||
| │ │ Orchestrator │ ← 权限校验、请求路由、结果组装 │ | ||
| │ └────────┬────────┘ │ | ||
| │ ▼ │ | ||
| │ ┌─────────────────┐ │ | ||
| │ │ Gateway Factory│ ← local/remote 模式选择 │ | ||
| │ └────────┬────────┘ │ | ||
| │ │ │ | ||
| │ ┌──────────────────┼──────────────────┐ │ | ||
| │ ▼ ▼ ▼ │ | ||
| │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ | ||
| │ │LocalRagGateway│ │RemoteRagGateway│ │ Fallback │ │ | ||
| │ │ (fallback) │ │ (主路径) │ │ Handler │ │ | ||
| │ └─────────────┘ └──────┬──────┘ └─────────────┘ │ |
There was a problem hiding this comment.
Doc contradicts PR intent: still describes LocalRagGateway + RAG_RUNTIME_MODE as active components.
This PR's goal is to remove LocalRagGateway, RAG_RUNTIME_MODE, and the remote→local fallback handler (see backend/app/core/config.py, backend/app/services/rag/gateway_factory.py, and the updated tests). However, the design doc being added in the same PR still presents them as core pieces:
- Line 27–28: "保留 LocalRagGateway 作为 fallback,支持灰度切换"
- Line 48–56: architecture diagram includes
LocalRagGatewayand aFallback Handler - Line 89: "根据
RAG_RUNTIME_MODE选择 local/remote" - Lines 186–224: full section on
RAG_RUNTIME_MODEconfiguration, 灰度切换 phases, and fallback flow - Lines 280–281: references
backend/app/services/rag/local_gateway.pyandgateway_factory.py's local/remote mode selection — the former file is deleted in this PR
Please update the document to reflect the final state (remote-only, no RAG_RUNTIME_MODE, no local fallback) or, if this is intentionally a historical design record, move it under an explicitly dated/superseded location and add a banner noting it describes an intermediate phase. As the doc stands, a future reader reconciling it against the code will be actively misled.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 34-34: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
|
|
||
| ## 架构概览 | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Specify languages on fenced code blocks.
markdownlint MD040 flags the fenced blocks at lines 34, 210, and 249 for missing language identifiers. Consider using text (or a more specific identifier) for the ASCII diagrams and flow charts.
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 34-34: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/plans/2026-04-19-knowledge-runtime-architecture.md` at line 34, The
fenced code blocks in the document containing ASCII diagrams and flow charts are
missing language identifiers (causing markdownlint MD040); update each fenced
block that contains ASCII art or flow charts to include an appropriate language
tag such as "text" or a more specific identifier (e.g., "graphviz" or "mermaid"
if applicable) so the blocks read as ```text (or ```graphviz/```mermaid) and
satisfy MD040 while preserving the diagrams' rendering.
| @retry( | ||
| retry=retry_if_exception_type(httpx.TransportError), | ||
| stop=stop_after_attempt(3), | ||
| wait=wait_exponential(multiplier=1, min=1, max=10), | ||
| reraise=True, | ||
| ) | ||
| async def fetch( | ||
| self, | ||
| content_ref: ContentRef, | ||
| ) -> tuple[bytes, str, str]: | ||
| """Fetch binary content from a ContentRef. | ||
|
|
||
| Args: | ||
| content_ref: The content reference to fetch. | ||
|
|
||
| Returns: | ||
| Tuple of (binary_data, source_file, file_extension). | ||
|
|
||
| Raises: | ||
| ContentFetchError: If fetching fails. | ||
| """ | ||
| if isinstance(content_ref, PresignedUrlContentRef): | ||
| return await self._fetch_from_presigned_url(content_ref) | ||
| elif isinstance(content_ref, BackendAttachmentStreamContentRef): | ||
| return await self._fetch_from_backend_stream(content_ref) | ||
| else: | ||
| raise ContentFetchError( | ||
| f"Unsupported content ref type: {type(content_ref).__name__}", | ||
| retryable=False, | ||
| ) | ||
|
|
||
| async def _fetch_from_presigned_url( | ||
| self, | ||
| content_ref: PresignedUrlContentRef, | ||
| ) -> tuple[bytes, str, str]: | ||
| """Fetch content directly from a presigned URL. | ||
|
|
||
| Args: | ||
| content_ref: Presigned URL content reference. | ||
|
|
||
| Returns: | ||
| Tuple of (binary_data, source_file, file_extension). | ||
| """ | ||
| url = content_ref.url | ||
| logger.debug(f"Fetching content from presigned URL: {url[:100]}...") | ||
|
|
||
| try: | ||
| async with httpx.AsyncClient( | ||
| timeout=self._settings.content_fetch_timeout | ||
| ) as client: | ||
| response = await client.get(url) | ||
| response.raise_for_status() | ||
|
|
||
| # Extract filename and extension from URL if possible | ||
| source_file, file_extension = self._extract_filename_from_url(url) | ||
|
|
||
| return response.content, source_file, file_extension | ||
|
|
||
| except httpx.HTTPStatusError as exc: | ||
| logger.error(f"HTTP error fetching from presigned URL: {exc}") | ||
| raise ContentFetchError( | ||
| f"Failed to fetch content: HTTP {exc.response.status_code}", | ||
| retryable=exc.response.status_code >= 500, | ||
| details={"url": url[:100], "status_code": exc.response.status_code}, | ||
| ) from exc | ||
| except httpx.TransportError as exc: | ||
| logger.error(f"Transport error fetching from presigned URL: {exc}") | ||
| raise ContentFetchError( | ||
| f"Transport error fetching content: {exc}", | ||
| retryable=True, | ||
| details={"url": url[:100]}, | ||
| ) from exc | ||
|
|
||
| async def _fetch_from_backend_stream( | ||
| self, | ||
| content_ref: BackendAttachmentStreamContentRef, | ||
| ) -> tuple[bytes, str, str]: | ||
| """Fetch content through Backend attachment stream endpoint. | ||
|
|
||
| Args: | ||
| content_ref: Backend attachment stream content reference. | ||
|
|
||
| Returns: | ||
| Tuple of (binary_data, source_file, file_extension). | ||
| """ | ||
| url = content_ref.url | ||
| auth_token = content_ref.auth_token | ||
|
|
||
| logger.debug(f"Fetching content from Backend stream: {url}") | ||
|
|
||
| try: | ||
| headers = {"Authorization": f"Bearer {auth_token}"} | ||
|
|
||
| async with httpx.AsyncClient( | ||
| timeout=self._settings.content_fetch_timeout | ||
| ) as client: | ||
| response = await client.get(url, headers=headers) | ||
| response.raise_for_status() | ||
|
|
||
| # Try to get filename from Content-Disposition header | ||
| source_file, file_extension = self._extract_filename_from_response( | ||
| response, url | ||
| ) | ||
|
|
||
| return response.content, source_file, file_extension | ||
|
|
||
| except httpx.HTTPStatusError as exc: | ||
| logger.error(f"HTTP error fetching from Backend stream: {exc}") | ||
| raise ContentFetchError( | ||
| f"Failed to fetch content from Backend: HTTP {exc.response.status_code}", | ||
| retryable=exc.response.status_code >= 500, | ||
| details={"url": url, "status_code": exc.response.status_code}, | ||
| ) from exc | ||
| except httpx.TransportError as exc: | ||
| logger.error(f"Transport error fetching from Backend stream: {exc}") | ||
| raise ContentFetchError( | ||
| f"Transport error fetching content from Backend: {exc}", | ||
| retryable=True, | ||
| details={"url": url}, | ||
| ) from exc |
There was a problem hiding this comment.
Retry decorator on fetch() is effectively dead code.
The @retry(retry=retry_if_exception_type(httpx.TransportError), ...) on fetch() (lines 56–61) never fires because the inner methods _fetch_from_presigned_url (line 121) and _fetch_from_backend_stream (line 169) already catch httpx.TransportError and re-raise it as ContentFetchError. By the time the exception reaches the decorator, it no longer matches httpx.TransportError, so tenacity never retries.
Pick one of:
- Retry on
ContentFetchErrorwithretryable=Trueinstead. - Let
httpx.TransportErrorpropagate from the inner methods (remove theexcept httpx.TransportErrorbranches) and wrap it intoContentFetchErroronly after retries are exhausted.
🛠️ Suggested fix (option 1)
`@retry`(
- retry=retry_if_exception_type(httpx.TransportError),
+ retry=retry_if_exception_type(ContentFetchError)
+ & retry_if_result_is_retryable, # or a custom predicate
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10),
reraise=True,
)or, more idiomatically, use tenacity.retry_if_exception with a predicate that inspects exc.retryable:
from tenacity import retry_if_exception
def _is_retryable(exc: BaseException) -> bool:
return isinstance(exc, ContentFetchError) and exc.retryable
`@retry`(
retry=retry_if_exception(_is_retryable),
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10),
reraise=True,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
56 - 175, The retry on fetch() currently only retries httpx.TransportError but
both _fetch_from_presigned_url and _fetch_from_backend_stream catch
TransportError and raise ContentFetchError, so tenacity never retries; update
fetch() to retry based on ContentFetchError.retryable by replacing the
decorator’s retry argument with a predicate (e.g., use
tenacity.retry_if_exception with a helper function _is_retryable that returns
isinstance(exc, ContentFetchError) and exc.retryable) so fetch() will retry when
inner methods raise ContentFetchError(retryable=True); keep the inner handlers
raising ContentFetchError as they are and only change the `@retry`(...) on fetch()
to reference the new predicate.
| name = "wegent-knowledge-runtime" | ||
| version = "1.0.0" | ||
| description = "Wegent Knowledge Runtime HTTP Service" | ||
| requires-python = ">=3.10,<=3.13" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python -m pip install --quiet packaging
python - <<'PY'
from packaging.specifiers import SpecifierSet
from packaging.version import Version
spec = SpecifierSet(">=3.10,<=3.13")
for version in ["3.13.0", "3.13.1", "3.13.9", "3.14.0"]:
print(version, Version(version) in spec)
PYRepository: wecode-ai/Wegent
Length of output: 110
Use <3.14 to include all Python 3.13 patch releases.
<=3.13 excludes versions like 3.13.1 and later patch releases; use <3.14 to support Python 3.13 generally.
Suggested change
-requires-python = ">=3.10,<=3.13"
+requires-python = ">=3.10,<3.14"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/pyproject.toml` at line 9, Update the requires-python spec
in pyproject.toml to allow all Python 3.13 patch releases by replacing the
current constraint requires-python = ">=3.10,<=3.13" with a range using an
exclusive upper bound, e.g. requires-python = ">=3.10,<3.14", so the
requires-python field correctly includes Python 3.13.x versions.
| import pytest | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def mock_storage_backend(): | ||
| """Create a mock storage backend for testing.""" | ||
| from unittest.mock import MagicMock | ||
|
|
||
| backend = MagicMock() | ||
| backend.test_connection.return_value = True | ||
| backend.retrieve.return_value = {"records": []} | ||
| backend.delete_document.return_value = {"status": "success", "deleted_chunks": 0} | ||
| backend.delete_knowledge.return_value = {"status": "success", "deleted_count": 0} | ||
| backend.drop_knowledge_index.return_value = {"status": "success"} | ||
| backend.get_all_chunks.return_value = [] | ||
| return backend | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def mock_embed_model(): | ||
| """Create a mock embedding model for testing.""" | ||
| from unittest.mock import MagicMock | ||
|
|
||
| model = MagicMock() | ||
| return model |
There was a problem hiding this comment.
Add return type hints to the pytest fixtures.
Both fixture functions are untyped; annotate them and move MagicMock to a normal import.
🧪 Proposed typing fix
+from unittest.mock import MagicMock
+
import pytest
`@pytest.fixture`
-def mock_storage_backend():
+def mock_storage_backend() -> MagicMock:
"""Create a mock storage backend for testing."""
- from unittest.mock import MagicMock
-
backend = MagicMock()
backend.test_connection.return_value = True
backend.retrieve.return_value = {"records": []}
@@
`@pytest.fixture`
-def mock_embed_model():
+def mock_embed_model() -> MagicMock:
"""Create a mock embedding model for testing."""
- from unittest.mock import MagicMock
-
- model = MagicMock()
- return model
+ return MagicMock()As per coding guidelines, **/*.py: Follow PEP 8, use Black formatter (line length: 88), isort for import organization, and require type hints.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/tests/conftest.py` around lines 7 - 31, Add explicit return
type hints and move the MagicMock import to the module top: import MagicMock
from unittest.mock once at the top of the file, then annotate the two fixtures
mock_storage_backend and mock_embed_model with return types (e.g., -> MagicMock)
so the fixtures are typed; keep fixture bodies identical but remove the inline
imports and ensure isort/Black formatting rules are followed.
- Backend 和 chat_shell 不再依赖 knowledge_engine 和 llama-index - RAG 操作统一通过 RemoteRagGateway 调用 knowledge_runtime 服务 - 更新 Docker 文件移除 knowledge_engine 复制 - 更新过时的注释说明
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/api/endpoints/adapter/retrievers.py (2)
38-39: Add@trace_async()to the new async endpoints.Per coding guidelines, async functions in
backend/**/*.pyshould be decorated with@trace_async()fromshared/telemetry/decorators.py. The two endpoints converted to async here (get_storage_retrieval_methods,get_storage_type_retrieval_methods) are missing it, whiletest_retriever_connectionbelow is also async and similarly untraced — worth addressing consistently.As per coding guidelines: "Use OpenTelemetry tracing decorators:
@trace_async()for async functions,@trace_sync()for sync functions fromshared/telemetry/decorators.py".Also applies to: 74-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 39, The async endpoints get_storage_retrieval_methods, get_storage_type_retrieval_methods and test_retriever_connection are missing OpenTelemetry tracing; import and apply the async tracing decorator by adding `@trace_async`() (from shared/telemetry/decorators.py) directly above each async def to ensure they are traced, and ensure the decorator import is added/merged with existing imports if not already present.
296-296: Nit: use f-string conversion flag (RUF010).- logger.error(f"Retriever connection test failed: {str(e)}") + logger.error(f"Retriever connection test failed: {e!s}")Same applies to the identical line at 300.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` at line 296, Replace the f-string that calls str(e) in the logger.error call with the f-string conversion flag so the exception is logged using its repr; specifically update the logger.error invocation that currently reads "logger.error(f\"Retriever connection test failed: {str(e)}\")" (the two identical occurrences in retrievers.py) to use "{e!r}" in the f-string so the log becomes more accurate and stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 96-108: The comparison in the loop currently lowercases only the
input storage_type but compares it to type_info.type as-is, causing false
negatives for case differences; update the match in the loop that iterates
response.storage_types to perform a case-insensitive comparison (e.g., compare
storage_type.casefold() to type_info.type.casefold()) so supported types like
"Elasticsearch" will be detected, and ensure the returned "storage_type"
preserves the original requested value or normalized form as intended
(references: response.storage_types, type_info.type, storage_type in
retrievers.py).
---
Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-39: The async endpoints get_storage_retrieval_methods,
get_storage_type_retrieval_methods and test_retriever_connection are missing
OpenTelemetry tracing; import and apply the async tracing decorator by adding
`@trace_async`() (from shared/telemetry/decorators.py) directly above each async
def to ensure they are traced, and ensure the decorator import is added/merged
with existing imports if not already present.
- Line 296: Replace the f-string that calls str(e) in the logger.error call with
the f-string conversion flag so the exception is logged using its repr;
specifically update the logger.error invocation that currently reads
"logger.error(f\"Retriever connection test failed: {str(e)}\")" (the two
identical occurrences in retrievers.py) to use "{e!r}" in the f-string so the
log becomes more accurate and stable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3e63f4cb-65c8-42f5-9139-64f36dae973c
📒 Files selected for processing (6)
AGENTS.mdbackend/app/api/api.pybackend/app/api/endpoints/adapter/retrievers.pybackend/app/api/endpoints/internal/__init__.pydocker/backend/Dockerfiledocker/standalone/Dockerfile
💤 Files with no reviewable changes (1)
- docker/backend/Dockerfile
✅ Files skipped from review due to trivial changes (3)
- backend/app/api/endpoints/internal/init.py
- backend/app/api/api.py
- AGENTS.md
- backend: storage type 比较改为大小写不敏感 - knowledge_runtime: 修复 content fetcher 重试逻辑 - knowledge_runtime: start.sh 修复脚本工作目录 - start.sh: backend 启动添加 INTERNAL_SERVICE_TOKEN 导出
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
start.sh (2)
1016-1055: 🛠️ Refactor suggestion | 🟠 MajorExtract service metadata instead of extending the monolithic script.
knowledge_runtimeis now wired through repeated service lists, aliases, ports, dependency sync, startup, health, status, and log output. This makes each future service change require edits in many places across an already 1,700+ line script. Consider a single service registry/helper layer with fields likename,aliases,port,dir,start_cmd, andhealth_path.As per coding guidelines, “File size limit: If a file exceeds 1000 lines, split it into multiple sub-modules” and “Extract common logic from ALL code.”
Also applies to: 1251-1251, 1349-1389, 1451-1706
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@start.sh` around lines 1016 - 1055, Replace the repeated per-service lists and case branches with a single service registry data structure (e.g., an associative array or arrays of service objects) that holds fields like name, aliases, port, dir, start_cmd, and health_path; update the stop logic that currently uses all_services/all_ports and the parsing case (references: all_services, all_ports, services, service_ports, and the knowledge_runtime/aliases) to look up services by alias and populate services/service_ports from the registry, then refactor all other repeated sections (dependency sync, startup, health checks, status, logs) to iterate the same registry keys instead of duplicating code; finally, move the registry and service-agnostic helper functions into a new sourced module to respect the file-size guideline (split the monolithic start.sh into main + services.sh and update callers to source the new module).
1316-1323:⚠️ Potential issue | 🟠 MajorAdd
--failflag and authentication to health check curl commands.The current health check uses
curl -s, which exits with success (code 0) for HTTP 401 and 500 responses. This masks authentication failures and service errors—the knowledge runtime health check could report healthy while the service is actually rejecting requests. Add the--failflag to make curl fail on HTTP errors, and include theINTERNAL_SERVICE_TOKENauthorization header when configured.Suggested fix
echo -n " Checking $name..." + local curl_args=(-fsS --connect-timeout 2) + if [ -n "$INTERNAL_SERVICE_TOKEN" ]; then + curl_args+=(-H "Authorization: Bearer $INTERNAL_SERVICE_TOKEN") + fi for ((i=1; i<=max_retries; i++)); do # Try health endpoint first if provided if [ -n "$health_path" ]; then - if curl -s --connect-timeout 2 "http://localhost:$port$health_path" >/dev/null 2>&1; then + if curl "${curl_args[@]}" "http://localhost:$port$health_path" >/dev/null 2>&1; then echo -e " ${GREEN}✓${NC} healthy (port $port)" return 0 fi @@ - check_service_health "knowledge_runtime" $KNOWLEDGE_RUNTIME_PORT "/internal/rag/health" || failed=1 + check_service_health "knowledge_runtime" "$KNOWLEDGE_RUNTIME_PORT" "/internal/rag/health" || failed=1Also applies to: 1652-1654
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@start.sh` around lines 1316 - 1323, Update the health-check curl invocations in the retry loop that checks "$health_path" (uses variables health_path, port, max_retries) to pass --fail so HTTP 4xx/5xx cause a non-zero exit, and include an Authorization header when INTERNAL_SERVICE_TOKEN is set (e.g., add -H "Authorization: Bearer $INTERNAL_SERVICE_TOKEN" only if INTERNAL_SERVICE_TOKEN is non-empty). Make the same change to the other health-check curl usage later in the script (the block around the second occurrence referenced by the reviewer). Ensure the command preserves existing flags like --connect-timeout and silent mode.
🧹 Nitpick comments (3)
knowledge_runtime/knowledge_runtime/services/content_fetcher.py (2)
192-202: Minor: filenames from presigned URLs are not percent-decoded.Object-storage keys with spaces or non-ASCII characters come back URL-encoded (e.g.
my%20report.pdf), and the returnedsource_filewill keep the encoding. Considerurllib.parse.unquote(filename)before returning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 192 - 202, The filename extracted from a URL (variables path, filename, extension and the return) can be percent-encoded (e.g. "my%20report.pdf"); decode it before computing or returning the extension. Use urllib.parse.unquote on the extracted filename (after parts[-1]) so filename becomes human-readable, then derive extension from that decoded filename and return the decoded filename and extension.
61-85: Missing OpenTelemetry tracing on async methods.Per repo guidelines, async functions should be decorated with
@trace_async()fromshared/telemetry/decorators.py. The three async methods here (fetch,_fetch_from_presigned_url,_fetch_from_backend_stream) perform network I/O and are exactly the spans you'll want when debugging ingestion failures, but none are traced. Worth adding, plusset_span_attribute/add_span_eventforcontent_ref.kind, status codes, and retry attempts.As per coding guidelines: "Use OpenTelemetry tracing decorators from 'shared/telemetry/decorators.py': '@trace_async()' for async functions, '@trace_sync()' for sync functions, 'add_span_event()' for events, 'set_span_attribute()' for attributes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 61 - 85, Decorate the three async methods fetch, _fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`() from shared/telemetry/decorators.py and instrument them by calling set_span_attribute for content_ref.kind (and any returned status/code) and add_span_event for retry attempts and key events (e.g., request start/finish and error details); ensure the imports for trace_async, set_span_attribute, and add_span_event are added and that exceptions still propagate (reraise behavior unchanged) so existing retry logic on fetch remains intact.backend/app/api/endpoints/adapter/retrievers.py (1)
38-39: Add tracing and return annotations to the changed async endpoints.These route handlers are now async, but they lack
@trace_async()and explicit return types.Proposed update
-from typing import Optional +from typing import Any, Optional @@ from shared.models import RuntimeRetrieverConfig +from shared.telemetry.decorators import trace_async @@ `@router.get`("/storage-types/retrieval-methods") -async def get_storage_retrieval_methods(): +@trace_async() +async def get_storage_retrieval_methods() -> dict[str, Any]: @@ `@router.get`("/storage-types/{storage_type}/retrieval-methods") -async def get_storage_type_retrieval_methods(storage_type: str): +@trace_async() +async def get_storage_type_retrieval_methods(storage_type: str) -> dict[str, Any]:As per coding guidelines, Python code: Follow PEP 8, use Black formatter (line length: 88), isort, and type hints are required; Use OpenTelemetry tracing decorators from 'shared/telemetry/decorators.py': '@trace_async()' for async functions, '@trace_sync()' for sync functions, 'add_span_event()' for events, 'set_span_attribute()' for attributes.
Also applies to: 74-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` around lines 38 - 39, Add the OpenTelemetry async tracer and explicit return type annotations to the async route handlers: decorate get_storage_retrieval_methods (and the other async handler at the referenced lines) with `@trace_async`() from shared.telemetry.decorators, add an appropriate explicit return type hint (for example -> list[str] or -> dict[str, Any] depending on the actual returned payload) to each async function signature, and import trace_async at the top of the module; then run Black/isort to satisfy formatting and import ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Line 18: Replace direct instantiation of RemoteRagGateway() with the existing
gateway factory used by test_retriever_connection (use the same factory call
rather than new RemoteRagGateway()), and wrap calls to get_storage_types() (and
any other remote gateway calls at the three locations noted) in try/except that
catches RemoteRagGatewayError and maps it to a 503 response (e.g., raise/return
an HTTP 503 with a clear message) instead of allowing a generic 500; reference
RemoteRagGateway, RemoteRagGatewayError and get_storage_types to locate the
changes.
- Line 296: Replace f-string interpolation inside logger calls with lazy
logging: change the call using logger.error(f"Retriever connection test failed:
{str(e)}") to logger.error("Retriever connection test failed: %s", e). For the
non-logger usage mentioned (the line 301 case), avoid f-strings by building the
message into a variable first (e.g., msg = "Retriever connection test failed:
{}".format(e) or msg = "Retriever connection test failed: " + str(e)) and then
use that variable where needed (or pass it to raise/return) so no f-string is
used outside logger calls; target the logger.error call and the surrounding code
that references e.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 218-229: The Content-Disposition parsing in content_fetcher.py
using content_disposition.split("filename=") is fragile: it mishandles RFC5987
filename* and leaves trailing parameters in the extracted filename, corrupting
filename and extension that flow into index_document_from_binary via
IndexExecutor.execute; replace the hand-rolled logic around content_disposition
/ filename extraction with a robust parser (e.g.,
email.message.Message().get_param('filename', header='content-disposition') or
werkzeug.http.parse_options_header) that correctly handles filename* (RFC5987),
quoted values, and strips subsequent parameters, then derive extension from the
sanitized filename (ensure you only consider the actual filename string when
computing extension) so downstream format detection receives clean values.
In `@knowledge_runtime/start.sh`:
- Around line 58-60: The --backend-url handling currently only updates
BACKEND_URL variable used for display but does not export or persist it when an
existing .env is present, so the existing BACKEND_INTERNAL_URL in the file wins
at runtime; modify the start.sh logic that parses the --backend-url option (the
BACKEND_URL variable and the code paths around exporting BACKEND_INTERNAL_URL)
to always export BACKEND_INTERNAL_URL from the parsed BACKEND_URL value and/or
overwrite the .env entry regardless of whether .env already exists so the flag
takes effect; ensure you update the same handling used in the other occurrences
(around lines handling BACKEND_URL, BACKEND_INTERNAL_URL at 261-280 and 294-295)
so the flag is effective in all code paths.
- Around line 216-229: The startup script currently auto-installs uv by piping a
remote installer into sh when uv is missing; remove that auto-execution and
instead fail fast with clear manual install instructions. In the uv
presence-check block (the conditional using command -v uv), remove the curl ...
| sh invocation and the subsequent sourcing of $HOME/.cargo/env, and replace
them with an explicit error/warning message that prints step-by-step install
instructions (e.g., how to install via the official installer or package
manager) and then exit 1; keep the existing pre-checks and final missing-uv
error path but do not perform any network-executed installs in start.sh.
- Around line 54-56: The script parses the --python arg into PYTHON_PATH but
later invokes the virtualenv creation via the uv venv command without passing
that interpreter; update the uv venv invocation (the command that creates the
venv) to include the user-specified interpreter flag (e.g., add --python
"$PYTHON_PATH" or --python "$PYTHON_EXEC" consistent with the parser) so uv venv
uses the validated interpreter provided to --python.
- Around line 147-150: The current check uses sort -V (which fails on macOS);
replace that logic by invoking the detected python ($python_exec) to compare
REQUIRED_VERSION and PYTHON_VERSION using Python-native version parsing (e.g.,
split on '.' and compare integer tuples or use packaging/distutils if available)
and make the shell conditional depend on that process exit code; update the
block that references PYTHON_VERSION, REQUIRED_VERSION and need_exit to call
$python_exec -c "..." (return 0 if PYTHON_VERSION >= REQUIRED_VERSION, non‑zero
otherwise) and keep the existing need_exit branch unchanged so macOS-compatible
comparisons are used.
In `@start.sh`:
- Around line 58-61: The save_config logic currently removes
KNOWLEDGE_RUNTIME_PORT but leaves an existing KNOWLEDGE_RUNTIME_URL unchanged,
causing URL/port drift; update the script (the save_config/port-writing sections
that include the grep -v lines and the blocks around lines referenced) to also
remove any existing KNOWLEDGE_RUNTIME_URL entry when rewriting .env and then
write a new KNOWLEDGE_RUNTIME_URL derived from the configured
KNOWLEDGE_RUNTIME_PORT (e.g., construct
"http://localhost:${KNOWLEDGE_RUNTIME_PORT}" or use the existing host variable
if present) so the URL always matches the saved port; apply the same change to
the other similar blocks noted (around lines 81-104 and 894-905) so all places
that update KNOWLEDGE_RUNTIME_PORT also refresh KNOWLEDGE_RUNTIME_URL.
---
Outside diff comments:
In `@start.sh`:
- Around line 1016-1055: Replace the repeated per-service lists and case
branches with a single service registry data structure (e.g., an associative
array or arrays of service objects) that holds fields like name, aliases, port,
dir, start_cmd, and health_path; update the stop logic that currently uses
all_services/all_ports and the parsing case (references: all_services,
all_ports, services, service_ports, and the knowledge_runtime/aliases) to look
up services by alias and populate services/service_ports from the registry, then
refactor all other repeated sections (dependency sync, startup, health checks,
status, logs) to iterate the same registry keys instead of duplicating code;
finally, move the registry and service-agnostic helper functions into a new
sourced module to respect the file-size guideline (split the monolithic start.sh
into main + services.sh and update callers to source the new module).
- Around line 1316-1323: Update the health-check curl invocations in the retry
loop that checks "$health_path" (uses variables health_path, port, max_retries)
to pass --fail so HTTP 4xx/5xx cause a non-zero exit, and include an
Authorization header when INTERNAL_SERVICE_TOKEN is set (e.g., add -H
"Authorization: Bearer $INTERNAL_SERVICE_TOKEN" only if INTERNAL_SERVICE_TOKEN
is non-empty). Make the same change to the other health-check curl usage later
in the script (the block around the second occurrence referenced by the
reviewer). Ensure the command preserves existing flags like --connect-timeout
and silent mode.
---
Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Around line 38-39: Add the OpenTelemetry async tracer and explicit return type
annotations to the async route handlers: decorate get_storage_retrieval_methods
(and the other async handler at the referenced lines) with `@trace_async`() from
shared.telemetry.decorators, add an appropriate explicit return type hint (for
example -> list[str] or -> dict[str, Any] depending on the actual returned
payload) to each async function signature, and import trace_async at the top of
the module; then run Black/isort to satisfy formatting and import ordering.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 192-202: The filename extracted from a URL (variables path,
filename, extension and the return) can be percent-encoded (e.g.
"my%20report.pdf"); decode it before computing or returning the extension. Use
urllib.parse.unquote on the extracted filename (after parts[-1]) so filename
becomes human-readable, then derive extension from that decoded filename and
return the decoded filename and extension.
- Around line 61-85: Decorate the three async methods fetch,
_fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`()
from shared/telemetry/decorators.py and instrument them by calling
set_span_attribute for content_ref.kind (and any returned status/code) and
add_span_event for retry attempts and key events (e.g., request start/finish and
error details); ensure the imports for trace_async, set_span_attribute, and
add_span_event are added and that exceptions still propagate (reraise behavior
unchanged) so existing retry logic on fetch remains intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 02f0511a-8df5-49f5-9ac3-70f35d37cebe
📒 Files selected for processing (4)
backend/app/api/endpoints/adapter/retrievers.pyknowledge_runtime/knowledge_runtime/services/content_fetcher.pyknowledge_runtime/start.shstart.sh
| from app.services.rag.gateway_factory import get_query_gateway | ||
| from app.services.rag.local_gateway import LocalRagGateway | ||
| from app.services.rag.remote_gateway import RemoteRagGatewayError | ||
| from app.services.rag.remote_gateway import RemoteRagGateway, RemoteRagGatewayError |
There was a problem hiding this comment.
Use the gateway factory and map remote storage-type failures.
Directly constructing RemoteRagGateway() bypasses the singleton/factory path already used by test_retriever_connection. Also, get_storage_types() failures currently bubble as generic 500s; map RemoteRagGatewayError to a clear 503.
Proposed update
-from app.services.rag.remote_gateway import RemoteRagGateway, RemoteRagGatewayError
+from app.services.rag.remote_gateway import RemoteRagGatewayError
@@
def _check_rag_available():
@@
)
+
+
+async def _get_storage_types_or_503():
+ """Fetch storage types from knowledge_runtime or map failures to 503."""
+ try:
+ return await get_query_gateway().get_storage_types()
+ except RemoteRagGatewayError as exc:
+ logger.warning(
+ "Failed to fetch storage types from knowledge_runtime",
+ extra={"error_type": type(exc).__name__},
+ )
+ raise HTTPException(
+ status_code=503,
+ detail="RAG knowledge runtime is unavailable",
+ ) from exc
@@
- gateway = RemoteRagGateway()
- response = await gateway.get_storage_types()
+ response = await _get_storage_types_or_503()
@@
- gateway = RemoteRagGateway()
- response = await gateway.get_storage_types()
+ response = await _get_storage_types_or_503()Also applies to: 58-59, 93-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/endpoints/adapter/retrievers.py` at line 18, Replace direct
instantiation of RemoteRagGateway() with the existing gateway factory used by
test_retriever_connection (use the same factory call rather than new
RemoteRagGateway()), and wrap calls to get_storage_types() (and any other remote
gateway calls at the three locations noted) in try/except that catches
RemoteRagGatewayError and maps it to a 503 response (e.g., raise/return an HTTP
503 with a clear message) instead of allowing a generic 500; reference
RemoteRagGateway, RemoteRagGatewayError and get_storage_types to locate the
changes.
| return {"success": False, "message": str(e)} | ||
|
|
||
| except RemoteRagGatewayError as e: | ||
| logger.error(f"Retriever connection test failed: {str(e)}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify Ruff no longer reports RUF010 for this file.
uv run ruff check backend/app/api/endpoints/adapter/retrievers.py --select RUF010Repository: wecode-ai/Wegent
Length of output: 102
🏁 Script executed:
# First, examine the specific file and line
cat -n backend/app/api/endpoints/adapter/retrievers.py | sed -n '290,305p'Repository: wecode-ai/Wegent
Length of output: 618
🏁 Script executed:
# Check if ruff is available as a direct command
which ruff && ruff --versionRepository: wecode-ai/Wegent
Length of output: 105
🏁 Script executed:
# If ruff is available, check for RUF010 violations in the file
ruff check backend/app/api/endpoints/adapter/retrievers.py --select RUF010 2>&1 || echo "Command failed or no violations found"Repository: wecode-ai/Wegent
Length of output: 1868
🏁 Script executed:
# Get more context around line 301 to understand the fix needed
cat -n backend/app/api/endpoints/adapter/retrievers.py | sed -n '299,302p'Repository: wecode-ai/Wegent
Length of output: 252
Fix the Ruff RUF010 logging warnings.
Use lazy logging for logger calls instead of f-string interpolation. Line 301 requires a different approach since it's not a logger call.
Proposed updates
except RemoteRagGatewayError as e:
- logger.error(f"Retriever connection test failed: {str(e)}")
+ logger.error("Retriever connection test failed: %s", e)
return {"success": False, "message": str(e)}
except Exception as e:
- logger.error(f"Retriever connection test failed: {str(e)}")
- return {"success": False, "message": f"Connection failed: {str(e)}"}
+ logger.error("Retriever connection test failed: %s", e)
+ return {"success": False, "message": f"Connection failed: {e!s}"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error(f"Retriever connection test failed: {str(e)}") | |
| except RemoteRagGatewayError as e: | |
| logger.error("Retriever connection test failed: %s", e) | |
| return {"success": False, "message": str(e)} | |
| except Exception as e: | |
| logger.error("Retriever connection test failed: %s", e) | |
| return {"success": False, "message": f"Connection failed: {e!s}"} |
🧰 Tools
🪛 Ruff (0.15.10)
[warning] 296-296: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/endpoints/adapter/retrievers.py` at line 296, Replace
f-string interpolation inside logger calls with lazy logging: change the call
using logger.error(f"Retriever connection test failed: {str(e)}") to
logger.error("Retriever connection test failed: %s", e). For the non-logger
usage mentioned (the line 301 case), avoid f-strings by building the message
into a variable first (e.g., msg = "Retriever connection test failed:
{}".format(e) or msg = "Retriever connection test failed: " + str(e)) and then
use that variable where needed (or pass it to raise/return) so no f-string is
used outside logger calls; target the logger.error call and the surrounding code
that references e.
| --backend-url) | ||
| BACKEND_URL="$2" | ||
| shift 2 |
There was a problem hiding this comment.
Apply --backend-url even when .env already exists.
When .env exists, --backend-url only changes the displayed value; it is never exported as BACKEND_INTERNAL_URL and the existing .env value wins at runtime. This makes the CLI flag ineffective for the common case.
Suggested fix
else
echo -e "${GREEN}✓ .env file already exists${NC}"
fi
+
+# Command-line value should override .env for this process.
+export BACKEND_INTERNAL_URL="$BACKEND_URL"
# Set PYTHONPATH to include project root
PROJECT_ROOT=$(cd .. && pwd)Also applies to: 261-280, 294-295
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/start.sh` around lines 58 - 60, The --backend-url handling
currently only updates BACKEND_URL variable used for display but does not export
or persist it when an existing .env is present, so the existing
BACKEND_INTERNAL_URL in the file wins at runtime; modify the start.sh logic that
parses the --backend-url option (the BACKEND_URL variable and the code paths
around exporting BACKEND_INTERNAL_URL) to always export BACKEND_INTERNAL_URL
from the parsed BACKEND_URL value and/or overwrite the .env entry regardless of
whether .env already exists so the flag takes effect; ensure you update the same
handling used in the other occurrences (around lines handling BACKEND_URL,
BACKEND_INTERNAL_URL at 261-280 and 294-295) so the flag is effective in all
code paths.
| PYTHON_VERSION=$($python_exec --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2) | ||
|
|
||
| if [ "$(printf '%s\n' "$REQUIRED_VERSION" "$PYTHON_VERSION" | sort -V | head -n1)" != "$REQUIRED_VERSION" ]; then | ||
| if [ "$need_exit" = "true" ]; then |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verification: run on macOS and Linux to confirm whether the current sort supports -V.
# Expected on macOS/BSD sort: non-zero exit or "illegal option" output.
printf '%s\n' "3.10" "3.11" | sort -VRepository: wecode-ai/Wegent
Length of output: 69
🏁 Script executed:
# First, let's look at the file structure
cd knowledge_runtime && ls -laRepository: wecode-ai/Wegent
Length of output: 559
🏁 Script executed:
# Check the exact content around lines 147-150
head -n 160 knowledge_runtime/start.sh | tail -n 30Repository: wecode-ai/Wegent
Length of output: 1064
🏁 Script executed:
# Search for macOS-specific handling in the script
rg -n "macOS|darwin|Darwin|uname" knowledge_runtime/start.shRepository: wecode-ai/Wegent
Length of output: 125
🌐 Web query:
BSD sort macOS -V flag version compatibility 2024 2025
💡 Result:
macOS's BSD sort (/usr/bin/sort) does not support the -V (version sort) flag in 2024, 2025, or as of 2026-04-21. This GNU coreutils extension for natural sorting of version numbers (e.g., 1.10 before 1.2) is absent from BSD implementations. Man pages for macOS sort (OSX 10.5-10.6 and general) list options like -b, -d, -f, -g, -i, -M, -n, -r, -k, -t, -z, but no -V or --version-sort. Stack Overflow discussions from 2010-2022 confirm -V fails on macOS, recommending workarounds like multi-key numeric sorts (e.g., sort -t. -k1,1n -k2,2n -k3,3n) or GNU sort via Homebrew (brew install coreutils; gsort -V). No evidence of Apple adding -V in Sequoia (15, 2024), Sonoma (14, 2023), or later versions. FreeBSD sort man page (macOS base) explicitly includes -V as a non-POSIX extension, but Apple's version omits it. Use gsort from coreutils for -V compatibility.
Citations:
- 1: https://stackoverflow.com/questions/4265284/how-to-do-sort-v-in-osx
- 2: https://www.unix.com/man-page/osx/1/sort?os=osx§ion=1&query=sort
- 3: https://superuser.com/questions/1087057/sort-v-on-mac-os-not-working
- 4: https://stackoverflow.com/questions/21394536/how-to-simulate-sort-v-on-macos
- 5: https://www.unix.com/man-page/osx/1/sort/
- 6: https://manp.gs/mac/1/sort
- 7: https://www.manpagez.com/man/1/sort/osx-10.5.3.php
- 8: https://www.manpagez.com/man/1/sort/osx-10.6.php
- 9: https://www.ss64.com/osx/sort.html
- 10: https://apple.stackexchange.com/questions/381772/is-there-a-way-to-change-the-order-in-which-the-finder-alphabetizes
- 11: https://apple.stackexchange.com/questions/459986/to-sort-files-case-sensitively
- 12: https://www.brew.sh
🏁 Script executed:
# Check more context around the function to understand the environment
head -n 200 knowledge_runtime/start.sh | tail -n 80Repository: wecode-ai/Wegent
Length of output: 2997
🏁 Script executed:
# Verify if Python path is already available at this point in the script
grep -n "PYTHON_PATH\|python_exec\|python" knowledge_runtime/start.sh | head -n 20Repository: wecode-ai/Wegent
Length of output: 1137
Use Python for version comparison instead of sort -V to ensure macOS compatibility.
The script explicitly handles macOS but relies on sort -V at line 148, which is unsupported by BSD sort on macOS. This causes valid Python installations to be rejected on macOS. Replace the sort-based comparison with Python's native version comparison:
Suggested fix
- PYTHON_VERSION=$($python_exec --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2)
+ PYTHON_VERSION=$("$python_exec" -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
- if [ "$(printf '%s\n' "$REQUIRED_VERSION" "$PYTHON_VERSION" | sort -V | head -n1)" != "$REQUIRED_VERSION" ]; then
+ if "$python_exec" -c "import sys; required = tuple(map(int, '$REQUIRED_VERSION'.split('.'))); sys.exit(0 if sys.version_info[:2] >= required else 1)"; then
+ return 0
+ else
if [ "$need_exit" = "true" ]; then
echo -e "${RED}Error: Python $REQUIRED_VERSION or higher is required (found $PYTHON_VERSION)${NC}"
exit 1
fi
return 1
+ fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/start.sh` around lines 147 - 150, The current check uses
sort -V (which fails on macOS); replace that logic by invoking the detected
python ($python_exec) to compare REQUIRED_VERSION and PYTHON_VERSION using
Python-native version parsing (e.g., split on '.' and compare integer tuples or
use packaging/distutils if available) and make the shell conditional depend on
that process exit code; update the block that references PYTHON_VERSION,
REQUIRED_VERSION and need_exit to call $python_exec -c "..." (return 0 if
PYTHON_VERSION >= REQUIRED_VERSION, non‑zero otherwise) and keep the existing
need_exit branch unchanged so macOS-compatible comparisons are used.
| if ! command -v uv &> /dev/null; then | ||
| echo -e "${YELLOW}Warning: uv is not installed${NC}" | ||
| echo -e "${YELLOW}Installing uv...${NC}" | ||
| curl -LsSf https://astral.sh/uv/install.sh | sh | ||
|
|
||
| if [ -f "$HOME/.cargo/env" ]; then | ||
| source "$HOME/.cargo/env" | ||
| fi | ||
|
|
||
| if ! command -v uv &> /dev/null; then | ||
| echo -e "${RED}Error: Failed to install uv${NC}" | ||
| exit 1 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Avoid auto-executing a remote installer from the startup script.
curl ... | sh executes network-fetched code during service startup. This is a supply-chain risk and is inconsistent with the root script, which prints install instructions instead of auto-installing. Prefer failing with explicit instructions.
Suggested fix
if ! command -v uv &> /dev/null; then
- echo -e "${YELLOW}Warning: uv is not installed${NC}"
- echo -e "${YELLOW}Installing uv...${NC}"
- curl -LsSf https://astral.sh/uv/install.sh | sh
-
- if [ -f "$HOME/.cargo/env" ]; then
- source "$HOME/.cargo/env"
- fi
-
- if ! command -v uv &> /dev/null; then
- echo -e "${RED}Error: Failed to install uv${NC}"
- exit 1
- fi
+ echo -e "${RED}Error: uv is not installed${NC}"
+ echo -e "${YELLOW}Install uv, then rerun this script:${NC}"
+ echo -e " ${BLUE}curl -LsSf https://astral.sh/uv/install.sh | sh${NC}"
+ exit 1
fi🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 222-222: Not following: ./.cargo/env was not specified as input (see shellcheck -x).
(SC1091)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/start.sh` around lines 216 - 229, The startup script
currently auto-installs uv by piping a remote installer into sh when uv is
missing; remove that auto-execution and instead fail fast with clear manual
install instructions. In the uv presence-check block (the conditional using
command -v uv), remove the curl ... | sh invocation and the subsequent sourcing
of $HOME/.cargo/env, and replace them with an explicit error/warning message
that prints step-by-step install instructions (e.g., how to install via the
official installer or package manager) and then exit 1; keep the existing
pre-checks and final missing-uv error path but do not perform any
network-executed installs in start.sh.
| grep -v "^KNOWLEDGE_RUNTIME_PORT=" | \ | ||
| grep -v "^WEGENT_FRONTEND_PORT=" | \ | ||
| grep -v "^EXECUTOR_IMAGE=" | \ | ||
| grep -v "^WEGENT_SOCKET_URL=" > "$temp_file" || true |
There was a problem hiding this comment.
Keep KNOWLEDGE_RUNTIME_URL in sync with the configured port.
save_config writes KNOWLEDGE_RUNTIME_PORT but preserves any existing KNOWLEDGE_RUNTIME_URL, and CLI -k/--knowledge-runtime-port only changes the port. If .env already contains KNOWLEDGE_RUNTIME_URL=http://localhost:8200 and the user switches to port 8201, the runtime starts on 8201 while the backend still calls 8200.
Suggested fix
grep -v "^EXECUTOR_MANAGER_PORT=" | \
grep -v "^KNOWLEDGE_RUNTIME_PORT=" | \
+ grep -v "^KNOWLEDGE_RUNTIME_URL=" | \
grep -v "^WEGENT_FRONTEND_PORT=" | \
@@
KNOWLEDGE_RUNTIME_PORT=$KNOWLEDGE_RUNTIME_PORT
+KNOWLEDGE_RUNTIME_URL=http://localhost:$KNOWLEDGE_RUNTIME_PORT
WEGENT_FRONTEND_PORT=$WEGENT_FRONTEND_PORT
@@
echo "EXECUTOR_MANAGER_PORT=$EXECUTOR_MANAGER_PORT" >> "$temp_file"
echo "KNOWLEDGE_RUNTIME_PORT=$KNOWLEDGE_RUNTIME_PORT" >> "$temp_file"
+ echo "KNOWLEDGE_RUNTIME_URL=http://localhost:$KNOWLEDGE_RUNTIME_PORT" >> "$temp_file"
echo "WEGENT_FRONTEND_PORT=$WEGENT_FRONTEND_PORT" >> "$temp_file"
@@
-[ -n "$CLI_KNOWLEDGE_RUNTIME_PORT" ] && KNOWLEDGE_RUNTIME_PORT="$CLI_KNOWLEDGE_RUNTIME_PORT"
+if [ -n "$CLI_KNOWLEDGE_RUNTIME_PORT" ]; then
+ KNOWLEDGE_RUNTIME_PORT="$CLI_KNOWLEDGE_RUNTIME_PORT"
+ KNOWLEDGE_RUNTIME_URL=""
+fiAlso applies to: 81-104, 894-905
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@start.sh` around lines 58 - 61, The save_config logic currently removes
KNOWLEDGE_RUNTIME_PORT but leaves an existing KNOWLEDGE_RUNTIME_URL unchanged,
causing URL/port drift; update the script (the save_config/port-writing sections
that include the grep -v lines and the blocks around lines referenced) to also
remove any existing KNOWLEDGE_RUNTIME_URL entry when rewriting .env and then
write a new KNOWLEDGE_RUNTIME_URL derived from the configured
KNOWLEDGE_RUNTIME_PORT (e.g., construct
"http://localhost:${KNOWLEDGE_RUNTIME_PORT}" or use the existing host variable
if present) so the URL always matches the saved port; apply the same change to
the other similar blocks noted (around lines 81-104 and 894-905) so all places
that update KNOWLEDGE_RUNTIME_PORT also refresh KNOWLEDGE_RUNTIME_URL.
- 使用 pyrfc6266 替代手写正则解析 Content-Disposition - 正确处理 RFC 5987 filename* 参数 (UTF-8 编码文件名) - 修复 start.sh 中 venv 创建时使用指定的 Python 版本 - 添加 pyrfc6266 依赖
…ests asyncio.to_thread expects sync functions, not coroutines. Using AsyncMock returns a coroutine object instead of the actual return value, causing TypeError: 'coroutine' object is not subscriptable errors.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
knowledge_runtime/start.sh (3)
109-109: Quote$portinlsofinvocation (SC2086).Minor hygiene flagged by shellcheck;
$portis validated but the unquoted expansion is still best avoided.🛠️ Proposed fix
- if lsof -Pi :$port -sTCP:LISTEN -t >/dev/null 2>&1; then + if lsof -Pi ":$port" -sTCP:LISTEN -t >/dev/null 2>&1; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` at line 109, The lsof invocation uses an unquoted variable expansion (the $port in the condition) which triggers SC2086; update the lsof call to quote the port variable (use "$port") in the expression that checks for a listening socket so the expansion is safe even if the value contains spaces or glob characters—locate the conditional containing lsof -Pi :$port -sTCP:LISTEN -t and replace the unquoted $port with a quoted "$port".
282-285: Avoid leading:inPYTHONPATHwhen unset.If
PYTHONPATHis empty on the invoking shell (the common case), the result is:${PROJECT_ROOT}, which Python interprets as including the current working directory onsys.path. Guard the separator:🛠️ Proposed fix
-PROJECT_ROOT=$(cd .. && pwd) -export PYTHONPATH="${PYTHONPATH}:${PROJECT_ROOT}" +PROJECT_ROOT=$(cd .. && pwd) +export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}${PROJECT_ROOT}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 282 - 285, The current startup snippet sets PYTHONPATH by appending with a leading colon which yields ":{PROJECT_ROOT}" when PYTHONPATH is empty; modify the logic in start.sh around the PROJECT_ROOT and PYTHONPATH export so you only prepend a colon when PYTHONPATH is non-empty (e.g., test [ -z "$PYTHONPATH" ] and export PYTHONPATH="$PROJECT_ROOT" else export PYTHONPATH="${PYTHONPATH}:$PROJECT_ROOT"), ensuring the echo message still prints the resolved PROJECT_ROOT variable.
249-253: Dead$?check —set -ealready aborts on failure.With
set -eat line 10,uv syncfailing terminates the script before reaching line 250, so theif [ $? -ne 0 ]block is unreachable (also SC2181). Either drop the check or runuv syncinside anif ! ...; thenguarded block if you want a custom message.🛠️ Proposed fix
-# Sync dependencies -uv sync -if [ $? -ne 0 ]; then - echo -e "${RED}Error: Failed to install dependencies${NC}" - exit 1 -fi +# Sync dependencies +if ! uv sync; then + echo -e "${RED}Error: Failed to install dependencies${NC}" + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/start.sh` around lines 249 - 253, The post-run `$?` check after running `uv sync` is unreachable because the script uses `set -e`; update the `uv sync` invocation to either remove the `if [ $? -ne 0 ]` block entirely or replace the plain call plus `$?` check with an explicit guarded form like `if ! uv sync; then ...` so you can emit the custom error message and exit; locate the call to `uv sync` in start.sh (and the surrounding `set -e` at the top) and apply the chosen fix to remove the dead `$?` check or wrap the `uv sync` in an `if ! ...; then` guard.knowledge_runtime/knowledge_runtime/services/content_fetcher.py (2)
68-86: Missing tracing decorators on async methods.Per coding guidelines, async functions should be annotated with
@trace_async().fetch,_fetch_from_presigned_url, and_fetch_from_backend_streamare all async and part of a new service on the data path — likely the most useful place to have spans for troubleshooting. Consider adding the decorators (andadd_span_event/set_span_attributefor URL/status details) unless there's a reason this module is intentionally excluded from tracing.As per coding guidelines: "Use
@trace_async()decorator for entire async functions,@trace_sync()for entire sync functions,add_span_event()to add events,set_span_attribute()to set attributes".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 68 - 86, Add tracing to the async content fetchers: decorate fetch, _fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`(), and instrument them to emit useful span details by calling add_span_event() for notable events (e.g., "fetch_started", "presigned_url_requested", "backend_stream_opened") and set_span_attribute() for key attributes (e.g., request URL, HTTP status, content_ref.id). Ensure the decorators are imported from the tracing utilities and place span events/attributes near network calls and on error paths inside the named functions to capture URL/status/error details for troubleshooting.
62-67: Nit: extract retry-policy magic numbers to module constants.
stop_after_attempt(3)andwait_exponential(multiplier=1, min=1, max=10)are tuning knobs that are easier to reason about (and tweak per deployment) when pulled out as named constants — also aligns with the guideline to extract magic numbers.+_MAX_FETCH_ATTEMPTS = 3 +_RETRY_WAIT_MIN_SECONDS = 1 +_RETRY_WAIT_MAX_SECONDS = 10 ... - stop=stop_after_attempt(3), - wait=wait_exponential(multiplier=1, min=1, max=10), + stop=stop_after_attempt(_MAX_FETCH_ATTEMPTS), + wait=wait_exponential( + multiplier=1, min=_RETRY_WAIT_MIN_SECONDS, max=_RETRY_WAIT_MAX_SECONDS + ),As per coding guidelines: "extract magic numbers to constants in Python code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines 62 - 67, Extract the numeric retry-policy literals into named module-level constants and use them in the retry decorator: define constants like RETRY_ATTEMPTS = 3, WAIT_MULTIPLIER = 1, WAIT_MIN = 1, WAIT_MAX = 10 at the top of the module and replace stop_after_attempt(3) and wait_exponential(multiplier=1, min=1, max=10) with stop_after_attempt(RETRY_ATTEMPTS) and wait_exponential(multiplier=WAIT_MULTIPLIER, min=WAIT_MIN, max=WAIT_MAX); keep the same retry wrapper and _is_retryable reference unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 183-203: The _extract_filename_from_url function currently splits
the URL by "?" and leaves percent-encoding in the filename; update it to parse
the URL path using urllib.parse.urlparse(url).path, take the last path segment,
then percent-decode that segment with urllib.parse.unquote before extracting the
extension (keep the existing extension logic using rsplit(".", 1)). Ensure you
still handle trailing slashes and return a sensible default like "unknown" when
no filename segment exists.
In `@knowledge_runtime/start.sh`:
- Around line 294-295: The startup script currently hardcodes uvicorn --reload
in the line invoking ".venv/bin/python -m uvicorn knowledge_runtime.main:app
--host \"$HOST\" --port \"$PORT\" --reload"; remove the unconditional --reload
and make it opt-in by checking an environment flag or script argument (for
example DEV=true or a --dev flag) and only appending "--reload" when that flag
is set; update start.sh to build the uvicorn command conditionally so
production/default runs do not use --reload but dev runs can enable it.
---
Nitpick comments:
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py`:
- Around line 68-86: Add tracing to the async content fetchers: decorate fetch,
_fetch_from_presigned_url, and _fetch_from_backend_stream with `@trace_async`(),
and instrument them to emit useful span details by calling add_span_event() for
notable events (e.g., "fetch_started", "presigned_url_requested",
"backend_stream_opened") and set_span_attribute() for key attributes (e.g.,
request URL, HTTP status, content_ref.id). Ensure the decorators are imported
from the tracing utilities and place span events/attributes near network calls
and on error paths inside the named functions to capture URL/status/error
details for troubleshooting.
- Around line 62-67: Extract the numeric retry-policy literals into named
module-level constants and use them in the retry decorator: define constants
like RETRY_ATTEMPTS = 3, WAIT_MULTIPLIER = 1, WAIT_MIN = 1, WAIT_MAX = 10 at the
top of the module and replace stop_after_attempt(3) and
wait_exponential(multiplier=1, min=1, max=10) with
stop_after_attempt(RETRY_ATTEMPTS) and
wait_exponential(multiplier=WAIT_MULTIPLIER, min=WAIT_MIN, max=WAIT_MAX); keep
the same retry wrapper and _is_retryable reference unchanged.
In `@knowledge_runtime/start.sh`:
- Line 109: The lsof invocation uses an unquoted variable expansion (the $port
in the condition) which triggers SC2086; update the lsof call to quote the port
variable (use "$port") in the expression that checks for a listening socket so
the expansion is safe even if the value contains spaces or glob
characters—locate the conditional containing lsof -Pi :$port -sTCP:LISTEN -t and
replace the unquoted $port with a quoted "$port".
- Around line 282-285: The current startup snippet sets PYTHONPATH by appending
with a leading colon which yields ":{PROJECT_ROOT}" when PYTHONPATH is empty;
modify the logic in start.sh around the PROJECT_ROOT and PYTHONPATH export so
you only prepend a colon when PYTHONPATH is non-empty (e.g., test [ -z
"$PYTHONPATH" ] and export PYTHONPATH="$PROJECT_ROOT" else export
PYTHONPATH="${PYTHONPATH}:$PROJECT_ROOT"), ensuring the echo message still
prints the resolved PROJECT_ROOT variable.
- Around line 249-253: The post-run `$?` check after running `uv sync` is
unreachable because the script uses `set -e`; update the `uv sync` invocation to
either remove the `if [ $? -ne 0 ]` block entirely or replace the plain call
plus `$?` check with an explicit guarded form like `if ! uv sync; then ...` so
you can emit the custom error message and exit; locate the call to `uv sync` in
start.sh (and the surrounding `set -e` at the top) and apply the chosen fix to
remove the dead `$?` check or wrap the `uv sync` in an `if ! ...; then` guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e3c23ef-3488-4725-b899-bd326bb05bf7
⛔ Files ignored due to path filters (1)
knowledge_runtime/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
knowledge_runtime/knowledge_runtime/services/content_fetcher.pyknowledge_runtime/pyproject.tomlknowledge_runtime/start.shknowledge_runtime/tests/test_admin_executor.py
✅ Files skipped from review due to trivial changes (2)
- knowledge_runtime/tests/test_admin_executor.py
- knowledge_runtime/pyproject.toml
| def _extract_filename_from_url(self, url: str) -> tuple[str, str]: | ||
| """Extract filename and extension from URL path. | ||
|
|
||
| Args: | ||
| url: The URL to extract from. | ||
|
|
||
| Returns: | ||
| Tuple of (filename, extension). | ||
| """ | ||
| # Try to get the last path segment | ||
| path = url.split("?")[0] # Remove query params | ||
| parts = path.rstrip("/").split("/") | ||
| filename = parts[-1] if parts else "unknown" | ||
|
|
||
| # Extract extension | ||
| if "." in filename: | ||
| extension = "." + filename.rsplit(".", 1)[-1] | ||
| else: | ||
| extension = "" | ||
|
|
||
| return filename, extension |
There was a problem hiding this comment.
URL-path filename is not percent-decoded.
For presigned URLs whose object key contains encoded characters (e.g. /bucket/my%20report.pdf or non-ASCII names), source_file will be stored with the percent-encoding intact (my%20report.pdf). The extension detection still works for common cases, but the persisted source_file that flows into IndexExecutor.execute() will be a mangled display name, and filenames whose encoding covers the extension separator (%2E) will lose their extension entirely.
Consider urllib.parse.unquote on the last path segment, and using urllib.parse.urlparse(url).path instead of manual split("?")[0] to also strip fragments and be robust to auth/userinfo components.
🛠️ Suggested tweak
- # Try to get the last path segment
- path = url.split("?")[0] # Remove query params
- parts = path.rstrip("/").split("/")
- filename = parts[-1] if parts else "unknown"
+ from urllib.parse import unquote, urlparse
+
+ path = urlparse(url).path
+ parts = path.rstrip("/").split("/")
+ filename = unquote(parts[-1]) if parts and parts[-1] else "unknown"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
183 - 203, The _extract_filename_from_url function currently splits the URL by
"?" and leaves percent-encoding in the filename; update it to parse the URL path
using urllib.parse.urlparse(url).path, take the last path segment, then
percent-decode that segment with urllib.parse.unquote before extracting the
extension (keep the existing extension logic using rsplit(".", 1)). Ensure you
still handle trailing slashes and return a sensible default like "unknown" when
no filename segment exists.
| # Start uvicorn | ||
| .venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" --reload No newline at end of file |
There was a problem hiding this comment.
Remove unconditional --reload; this is a dev-only flag.
uvicorn --reload spawns a file-watcher process and reloads workers on any source change. That's inappropriate as the default for a "one-click startup" script that will also be used for local smoke tests and container/service deployments: it increases memory, interacts poorly with process-manager signal handling, and can cause surprise restarts while running. Make it opt-in via a flag (e.g., --dev) or remove it.
🛠️ Proposed fix
+# Parse --dev flag earlier in the arg loop; default off
+RELOAD_FLAG=""
+if [ "${DEV_MODE:-false}" = "true" ]; then
+ RELOAD_FLAG="--reload"
+fi
+
# Start uvicorn
-.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" --reload
+.venv/bin/python -m uvicorn knowledge_runtime.main:app --host "$HOST" --port "$PORT" $RELOAD_FLAG🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/start.sh` around lines 294 - 295, The startup script
currently hardcodes uvicorn --reload in the line invoking ".venv/bin/python -m
uvicorn knowledge_runtime.main:app --host \"$HOST\" --port \"$PORT\" --reload";
remove the unconditional --reload and make it opt-in by checking an environment
flag or script argument (for example DEV=true or a --dev flag) and only
appending "--reload" when that flag is set; update start.sh to build the uvicorn
command conditionally so production/default runs do not use --reload but dev
runs can enable it.
Summary by CodeRabbit
New Features
Chores
Documentation