Feature/rag service extraction#927
Conversation
📝 WalkthroughWalkthroughThe PR extracts a backend-agnostic RAG execution kernel into a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Backend
participant Gateway as RagGateway
participant Local as LocalRagGateway
participant Remote as RemoteRagGateway
participant KR as knowledge_runtime
participant KE as knowledge_engine
participant Storage
rect rgba(200,150,255,0.5)
Note over Client,Storage: Local retrieval flow
Client->>Backend: POST /api/internal/rag/retrieve
Backend->>Backend: build QueryRuntimeSpec (may finalize route_mode)
Backend->>Gateway: get_query_gateway()
Gateway-->>Backend: LocalRagGateway
Backend->>Local: LocalRagGateway.query(spec, db)
Local->>KE: QueryExecutor.execute(...)
KE->>Storage: retrieve(...)
Storage-->>KE: records
KE-->>Local: records
Local-->>Backend: records
Backend-->>Client: { records: [...] }
end
sequenceDiagram
participant Client
participant Backend
participant Gateway as RagGateway
participant Remote as RemoteRagGateway
participant KR as knowledge_runtime
participant KE as knowledge_engine
participant Storage
rect rgba(255,200,150,0.5)
Note over Client,Storage: Remote retrieval flow with fallback
Client->>Backend: POST /api/internal/rag/retrieve
Backend->>Backend: build QueryRuntimeSpec (route_mode may be "auto")
Backend->>Gateway: get_query_gateway()
Gateway-->>Backend: RemoteRagGateway
Backend->>Remote: POST /internal/rag/query (RemoteRagGateway.query)
Remote->>KR: runtime handler validates and fetches content (if needed)
KR->>KE: QueryExecutor.execute(...)
KE->>Storage: retrieve(...)
Storage-->>KE: records
KE-->>KR: records
KR-->>Remote: RemoteQueryResponse
Remote-->>Backend: RemoteQueryResponse
alt remote raises RemoteRagGatewayError
Backend->>Local: LocalRagGateway.query(spec, db) -- fallback
Local-->>Backend: records
end
Backend-->>Client: { records: [...] }
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 16
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (9)
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-94-95 (1)
94-95:⚠️ Potential issue | 🟡 MinorReplace local absolute paths with repository-relative doc links.
Lines 94-95 currently link to
/Users/..., which will break for everyone except the original author machine.🔗 Proposed fix
-- Modify: [2026-04-03-rag-service-extraction-design.md](/Users/yanhe1/iDev/weibo/wegent/docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md) -- Modify: [2026-04-04-rag-service-extraction-implementation-plan.md](/Users/yanhe1/iDev/weibo/wegent/docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md) +- Modify: [2026-04-03-rag-service-extraction-design.md](../specs/knowledge/2026-04-03-rag-service-extraction-design.md) +- Modify: [2026-04-04-rag-service-extraction-implementation-plan.md](./2026-04-04-rag-service-extraction-implementation-plan.md)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md` around lines 94 - 95, The links in the diff point to absolute local paths (e.g., /Users/yanhe1/.../2026-04-03-rag-service-extraction-design.md and /Users/yanhe1/.../2026-04-04-rag-service-extraction-implementation-plan.md); update those to repository-relative links (e.g., docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md and docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md) so the references work for all contributors and CI; edit the link targets in the file that contains the two links to replace the absolute paths with the repo-relative paths while keeping the same display text.backend/tests/services/rag/test_retrieval_service.py-334-336 (1)
334-336:⚠️ Potential issue | 🟡 MinorAdd return type hints to the new test methods.
Both new methods should declare
-> Noneto comply with the Python typing rule.As per coding guidelines, "Python: Follow PEP 8, Black formatter (line length: 88), isort, type hints required".🔧 Suggested typing fix
def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget( self, - ): + ) -> None: @@ def test_decide_route_mode_for_chat_shell_returns_direct_injection_when_auto_fits( self, - ): + ) -> None:Also applies to: 352-354
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/rag/test_retrieval_service.py` around lines 334 - 336, Add explicit return type hints "-> None" to the new test function definitions so they conform to the project's typing rules; update the signature of test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget (and the other new test method around lines 352-354) to include "-> None" after the parameter list, keeping the rest of the signature unchanged.backend/app/services/rag/storage/factory.py-166-189 (1)
166-189:⚠️ Potential issue | 🟡 MinorFail fast when runtime storage URL is missing.
The new runtime path accepts
url=Noneand passes it to backend constructors, which can fail later with less actionable errors. Validate it here and raise a clearValueError.🔧 Suggested fix
def create_storage_backend_from_runtime_config( retriever_config: RuntimeRetrieverConfig, ) -> BaseStorageBackend: """Create storage backend from resolved runtime retriever config.""" storage_config = retriever_config.storage_config or {} storage_type = storage_config.get("type", "").lower() if storage_type not in STORAGE_BACKEND_REGISTRY: raise ValueError( f"Unsupported storage type: {storage_type}. " f"Supported types: {list(STORAGE_BACKEND_REGISTRY.keys())}" ) + if not storage_config.get("url"): + raise ValueError("Storage url is required in runtime storage_config") config = { "url": storage_config.get("url"), "username": storage_config.get("username"), "password": storage_config.get("password"),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/storage/factory.py` around lines 166 - 189, The runtime storage factory create_storage_backend_from_runtime_config currently allows storage_config.get("url") to be None and forwards that to backend constructors; add a validation after computing storage_type (and after ensuring storage_type is supported) that checks the resolved URL (storage_config.get("url")) is non-empty/ non-None and raise a clear ValueError like "Missing storage URL for storage_type '<type>'" if absent; then proceed to build the config dict (with "url") and instantiate backend_class from STORAGE_BACKEND_REGISTRY as before.backend/app/services/rag/local_gateway.py-45-47 (1)
45-47:⚠️ Potential issue | 🟡 MinorAdd explicit
dbguard for parity with other gateway methods.
index_documentandqueryfail fast whendbis missing;delete_document_indexshould do the same for consistent error behavior.🔧 Suggested fix
async def delete_document_index( self, spec: DeleteRuntimeSpec, *, db: Session, ) -> dict: + if db is None: + raise ValueError("db is required for LocalRagGateway.delete_document_index") return await self._delete_executor(spec, db=db)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/local_gateway.py` around lines 45 - 47, The delete_document_index method currently forwards to _delete_executor without validating the db parameter; add an explicit guard at the start of delete_document_index (the same way index_document and query do) to raise an immediate error (e.g., throw a ValueError or appropriate framework exception) when db is missing or falsy, then call and return await self._delete_executor(spec, db=db).knowledge_engine/tests/test_document_service.py-45-55 (1)
45-55:⚠️ Potential issue | 🟡 MinorWeak assertion for
created_atfield.Line 52 asserts
"created_at": result["created_at"], which always passes since it compares the result to itself. This doesn't validate that a timestamp was actually generated.Proposed fix using isinstance or ANY matcher
+from unittest.mock import ANY + assert result == { "doc_ref": "9", "knowledge_id": "1", "source_file": "report.pdf", "chunk_count": 2, "index_name": "wegent_kb_1", "status": "success", - "created_at": result["created_at"], + "created_at": ANY, "chunks_data": [{"chunk_index": 0}, {"chunk_index": 1}], "indexed_count": 2, } + # Optionally verify it's a valid ISO timestamp + assert isinstance(result["created_at"], str)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/tests/test_document_service.py` around lines 45 - 55, The test currently compares "created_at" to itself which always succeeds; update the assertion in test_document_service.py (the result dict for variable result) to verify created_at is present and a valid timestamp (e.g., assert "created_at" in result and that result["created_at"] is a string parsable by datetime.fromisoformat or matches an ISO/timestamp regex, or use a matcher like unittest.mock.ANY) so the test actually validates a timestamp was generated.docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md-333-360 (1)
333-360:⚠️ Potential issue | 🟡 MinorNormalize subsection heading depth to project docs hierarchy.
Line 333 and Line 347 use
####, but this repo’s docs convention caps subsection depth at###.📄 Suggested doc fix
-#### `backend_attachment_stream` +### `backend_attachment_stream` ... -#### `presigned_url` +### `presigned_url`As per coding guidelines, "Use consistent heading hierarchy:
#for title,##for sections,###for subsections".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md` around lines 333 - 360, The subsection headings for `backend_attachment_stream` and `presigned_url` use four-level headers (####) but must follow the repo convention that caps subsections at three-level headers (###); update those two headings from `####` to `###` so they match the project's hierarchy and restore consistent heading depth across the document.backend/app/services/rag/content_refs.py-29-45 (1)
29-45:⚠️ Potential issue | 🟡 MinorValidate
expires_delta_secondsto prevent invalid/expired content refs.Line 40 can produce already-expired refs when
expires_delta_seconds <= 0. Add a fail-fast guard.🧪 Suggested fix
def build_content_ref_for_attachment( *, db: Session, attachment_id: int, expires_delta_seconds: int = DEFAULT_CONTENT_REF_TTL_SECONDS, ) -> BackendAttachmentStreamContentRef | PresignedUrlContentRef: """Build a content reference for an attachment used by remote indexing.""" + if expires_delta_seconds <= 0: + raise ValueError("expires_delta_seconds must be greater than 0") context = context_service.get_context_optional( db=db, context_id=attachment_id, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/content_refs.py` around lines 29 - 45, The code must fail fast when expires_delta_seconds is non-positive to avoid creating already-expired refs: in the function that returns BackendAttachmentStreamContentRef | PresignedUrlContentRef, validate the expires_delta_seconds parameter (defaults to DEFAULT_CONTENT_REF_TTL_SECONDS) right after the signature and before calling context_service.get_context_optional; if expires_delta_seconds <= 0 raise a ValueError (e.g., "expires_delta_seconds must be positive") and otherwise proceed to compute expires_at and call context_service.get_attachment_url; reference the symbols expires_delta_seconds, DEFAULT_CONTENT_REF_TTL_SECONDS, expires_at, context_service.get_context_optional, and context_service.get_attachment_url when making the change.shared/tests/test_knowledge_runtime_protocol.py-73-90 (1)
73-90:⚠️ Potential issue | 🟡 MinorIsolate unknown
content_ref.kindas the sole failing validation.Line 82 and 83 provide incomplete nested configs that lack required fields (
namefor retriever,model_namefor embedding). This causes the test to raiseValidationErrorbefore validating thecontent_ref.kinddiscriminator, obscuring the test's intended assertion.Fix: Supply complete, schema-conformant configs
with pytest.raises(ValidationError): remote_index_request.model_validate( { "knowledge_base_id": 11, "document_id": 22, "index_owner_user_id": 33, - "retriever_config": {"provider": "milvus"}, - "embedding_model_config": {"model": "text-embedding-3-large"}, + "retriever_config": { + "name": "retriever-a", + "namespace": "default", + "storage_config": {"type": "milvus", "url": "http://milvus:19530"}, + }, + "embedding_model_config": { + "model_name": "text-embedding-3-large", + "model_namespace": "default", + "resolved_config": {"protocol": "openai"}, + }, "index_families": ["chunk_vector"], "content_ref": { "kind": "unsupported_kind", "url": "http://backend:8000/api/internal/rag/content/22", }, } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shared/tests/test_knowledge_runtime_protocol.py` around lines 73 - 90, The test test_remote_index_request_rejects_unknown_content_ref_kind currently fails earlier because retriever_config and embedding_model_config are incomplete; update the remote_index_request.model_validate payload to supply schema-conformant configs (e.g., include the required retriever field like "name" in retriever_config and the required embedding field like "model_name" or "model" in embedding_model_config) so ValidationError is triggered solely by the unsupported content_ref.kind discriminator; ensure you keep the same content_ref with "kind": "unsupported_kind" and reference the RemoteIndexRequest/remote_index_request symbols when locating the test.docker-compose-rag.yml-21-22 (1)
21-22:⚠️ Potential issue | 🟡 MinorHealthcheck command exposes password in process listing.
The
-p123456flag makes the root password visible viaps aux. Use environment variable reference instead:healthcheck: - test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p123456"] + test: ["CMD-SHELL", "mysqladmin ping -h localhost -u root -p$$MYSQL_ROOT_PASSWORD"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose-rag.yml` around lines 21 - 22, The healthcheck currently exposes the root password by using the literal "-p123456" in the healthcheck test; change the healthcheck for the MySQL service to read the password from the environment (e.g., MYSQL_ROOT_PASSWORD) instead of a hardcoded value — update the healthcheck "test" entry to invoke a shell (sh -c) so the environment variable is expanded and pass the password as -p"$MYSQL_ROOT_PASSWORD" or use a client option file via --defaults-extra-file, ensuring the literal password is removed from the test command.
🧹 Nitpick comments (18)
backend/tests/services/rag/test_retrieval_service.py (1)
334-351: Strengthen no-budget route test by asserting estimator is not called.This test currently validates only the returned mode. It should also guard against regressions where token estimation is still executed when
context_windowisNone.✅ Suggested test hardening
- def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget( - self, - ): + def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget( + self, + ) -> None: from app.services.rag.retrieval_service import RetrievalService service = RetrievalService() db = MagicMock() - result = service.decide_route_mode_for_chat_shell( - query="test", - knowledge_base_ids=[123], - db=db, - route_mode="auto", - context_window=None, - ) + with patch.object( + RetrievalService, + "_estimate_total_tokens_for_knowledge_bases", + ) as mock_estimate: + result = service.decide_route_mode_for_chat_shell( + query="test", + knowledge_base_ids=[123], + db=db, + route_mode="auto", + context_window=None, + ) assert result == "rag_retrieval" + mock_estimate.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/rag/test_retrieval_service.py` around lines 334 - 351, The test test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget should also assert that no token estimation is performed when context_window is None; update the test to mock or spy the estimator used by RetrievalService.decide_route_mode_for_chat_shell (e.g., patch RetrievalService.estimate_token_count or the external TokenEstimator class/function the method calls) and add an assertion that that mock was not called, while leaving the existing assertion that the result == "rag_retrieval".backend/tests/services/rag/test_content_refs.py (1)
11-26: Extract test magic numbers to named constants.Please replace inline literals (
101,0,7,42) with constants to keep intent clear and simplify future updates.
As per coding guidelines, "Python: Extract magic numbers to constants".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/rag/test_content_refs.py` around lines 11 - 26, Replace the inline magic numbers in _create_attachment_context with module-level named constants: define constants such as DEFAULT_ATTACHMENT_ID = 101, DEFAULT_SUBTASK_ID = 0, DEFAULT_USER_ID = 7, and DEFAULT_FILE_SIZE = 42 at the top of the test file and use them in the SubtaskContext construction (replace 101, 0, 7, 42). Update any calls to _create_attachment_context that rely on the default attachment_id to continue passing explicit values when needed; keep the constant names descriptive and all-caps per project style.backend/tests/services/test_summary_service.py (1)
442-444: Remove redundant assertion to keep the test focused.
assert_called_once_with(...)already validatesknowledge_base_id, so the extra kwargs assertion is duplicate noise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/test_summary_service.py` around lines 442 - 444, Remove the redundant kwargs assertion that checks mock_build_delete_runtime_spec.call_args.kwargs["knowledge_base_id"] == test_knowledge_base.id; the existing assert_called_once_with(...) on mock_build_delete_runtime_spec already verifies the same parameter. Locate the assertion referencing mock_build_delete_runtime_spec.call_args.kwargs and delete that line so the test only uses mock_build_delete_runtime_spec.assert_called_once_with(...) to validate the call.backend/app/api/endpoints/internal/rag.py (3)
170-174: Add type hints to helper function.The helper function lacks type hints, which are required per coding guidelines.
Proposed type hints
-def _resolve_query_gateway(runtime_spec): +def _resolve_query_gateway(runtime_spec) -> RagGateway: route_mode = getattr(runtime_spec, "route_mode", "auto") if route_mode == "rag_retrieval": return get_query_gateway() return LocalRagGateway()Note: You'll need to import
RagGatewayfromapp.services.rag.gatewayif not already imported.As per coding guidelines, "Python: ... type hints required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag.py` around lines 170 - 174, Add type hints to _resolve_query_gateway by annotating its parameter and return type: import Any from typing and annotate runtime_spec: Any, and set the function return type to RagGateway (import RagGateway from app.services.rag.gateway). Ensure the body still checks route_mode with getattr and returns get_query_gateway() or LocalRagGateway() but with the function signature updated to def _resolve_query_gateway(runtime_spec: Any) -> RagGateway:.
203-213: Add type hints and consider logging level.
- Missing return type hint
- The fallback path logs at WARNING level, which is appropriate for transient failures
Proposed type hints
-async def _execute_query_with_remote_fallback(runtime_spec, db: Session): +async def _execute_query_with_remote_fallback(runtime_spec, db: Session) -> dict: rag_gateway = _resolve_query_gateway(runtime_spec)As per coding guidelines, "Python: ... type hints required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag.py` around lines 203 - 213, Add explicit type hints to _execute_query_with_remote_fallback: annotate the function signature with a return type (e.g., async def _execute_query_with_remote_fallback(runtime_spec: RuntimeSpec, db: Session) -> RagQueryResult) or, if RagQueryResult/RuntimeSpec aren't defined yet, use typing.Any for runtime_spec and the return type and then replace with concrete types later; ensure necessary imports (RuntimeSpec, RagQueryResult or typing.Any) are added at the top. Keep the existing logger.warning level as-is for the RemoteRagGatewayError except path; do not change the fallback behavior of LocalRagGateway().query or the use of _resolve_query_gateway and RemoteRagGatewayError.
177-200: Add type hints and consider simplifying attribute checks.
- Missing type hints on
_finalize_query_runtime_spec- The
hasattrcheck at line 187 followed bygetattrat line 191 is slightly redundantProposed improvements
-def _finalize_query_runtime_spec(runtime_spec, db: Session): +def _finalize_query_runtime_spec(runtime_spec, db: Session): # -> QueryRuntimeSpec if getattr(runtime_spec, "route_mode", "auto") != "auto": return runtime_spec required_attributes = ( "query", "knowledge_base_ids", "document_ids", "direct_injection_budget", "model_copy", ) if not all(hasattr(runtime_spec, attr) for attr in required_attributes): return runtime_spec retrieval_service = RetrievalService() - budget = getattr(runtime_spec, "direct_injection_budget", None) + budget = runtime_spec.direct_injection_budget # Safe after hasattr check resolved_route_mode = retrieval_service.decide_route_mode_for_chat_shell(As per coding guidelines, "Python: ... type hints required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag.py` around lines 177 - 200, Add explicit type hints to _finalize_query_runtime_spec (e.g., runtime_spec: Any | YourRuntimeSpecType, db: Session) and a proper return type matching runtime_spec; replace the hasattr(...) checks with a single concise existence check using getattr(runtime_spec, attr, None) for each required attribute (e.g., if not all(getattr(runtime_spec, a, None) is not None for a in required_attributes): return runtime_spec) to avoid redundant hasattr/getattr, and keep the subsequent budget = getattr(runtime_spec, "direct_injection_budget", None) and the call to RetrievalService().decide_route_mode_for_chat_shell(...) as-is before returning runtime_spec.model_copy(update={"route_mode": resolved_route_mode}).backend/app/services/rag/embedding/factory.py (1)
221-224: Redundantisinstancechecks forcustom_headers.The parameter
custom_headersis already typed asdict[str, Any](line 214), and callers at lines 186 and 202 already ensure it's a dict. Theisinstance(custom_headers, dict)checks at lines 222-224 and 267 are redundant.Proposed simplification
if ( custom_headers - and isinstance(custom_headers, dict) and len(custom_headers) > 0 ):- headers=custom_headers if isinstance(custom_headers, dict) else {}, + headers=custom_headers,Also applies to: 267-267
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/embedding/factory.py` around lines 221 - 224, The conditional branches in embedding factory that check "isinstance(custom_headers, dict)" are redundant because custom_headers is typed as dict[str, Any] and callers already validate it; update the conditions that currently read something like "if custom_headers and isinstance(custom_headers, dict) and len(custom_headers) > 0" (references: variable custom_headers in the embedding factory) to a simpler truthy check (e.g., "if custom_headers:" or "if custom_headers and len(custom_headers) > 0" if explicit emptiness is desired), and remove the duplicate isinstance check elsewhere (the second occurrence around the same custom_headers usage) so the logic relies on the type annotation/call-site guarantees rather than runtime isinstance checks.backend/app/services/rag/gateway_factory.py (1)
13-17: Consider caching gateway instances.Each call to
get_*_gateway()creates a new gateway instance. If gateways are stateless and instantiation is cheap, this is fine. However, ifRemoteRagGatewayorLocalRagGatewayhold connections or expensive resources, consider caching instances per mode.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/gateway_factory.py` around lines 13 - 17, The _build_gateway factory currently creates a new RemoteRagGateway or LocalRagGateway on each call; change it to cache instances per normalized mode so repeated calls return the same gateway when appropriate. Implement a module-level cache (e.g., _GATEWAY_CACHE dict keyed by normalized_mode) and update _build_gateway to check the cache, create and store the gateway (RemoteRagGateway or LocalRagGateway) on first request, and return the cached instance thereafter; if concurrency is a concern, guard cache mutation with a simple lock (threading.Lock) to make get/create atomic.backend/app/services/rag/retrieval_service.py (1)
229-242: Clarify the unusedqueryparameter.The
del querystatement (line 240) is an unconventional way to suppress "unused parameter" warnings. Consider using an underscore-prefixed name or adding a brief comment explaining why the parameter exists but isn't used.Alternative approaches
Option 1: Underscore prefix in signature
def decide_route_mode_for_chat_shell( self, *, - query: str, + query: str, # Reserved for future fine-grained routing; currently unused knowledge_base_ids: list[int],Option 2: Use
_assignment- del query + _ = query # Reserved for future use🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/retrieval_service.py` around lines 229 - 242, The function decide_route_mode_for_chat_shell currently deletes the unused parameter via "del query"; replace this with a clearer pattern by renaming the parameter to _query (or prefixing it with an underscore in the signature) or, if renaming is undesirable, keep the parameter name and add a one-line comment explaining it's intentionally unused (e.g., "# query kept for API compatibility; not used here") and remove the del query statement; locate this change in decide_route_mode_for_chat_shell to ensure signatures and any callers remain consistent.knowledge_runtime/knowledge_runtime/api/internal/rag.py (1)
19-40: Instrument internal async endpoints with tracing decorators.The three route handlers should be traced at function boundaries to keep request-level observability consistent across services.
As per coding guidelines, "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py` around lines 19 - 40, Add the OpenTelemetry async tracing decorator to each internal route handler: annotate index_document, query, and delete_document_index with `@trace_async` using a descriptive span_name (e.g., "internal.rag.index_document", "internal.rag.query", "internal.rag.delete_document_index"), a tracer_name (e.g., "internal" or the service tracer), and an extract_attributes callable that extracts relevant request fields (IDs, query text, tenant info) from the RemoteIndexRequest/RemoteQueryRequest/RemoteDeleteDocumentIndexRequest; ensure the decorator is imported and applied directly above each async function definition so the entire async function is traced.backend/app/services/rag/content_refs.py (1)
25-63: Add sync tracing decorator on content-ref construction path.This sync service function is part of the remote indexing path and should be traced at function boundary.
As per coding guidelines, "Use
@trace_sync(span_name, tracer_name, extract_attributes)decorator to trace entire sync functions in OpenTelemetry".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/content_refs.py` around lines 25 - 63, Add the OpenTelemetry sync tracer to the content-ref construction entrypoint by decorating build_content_ref_for_attachment with `@trace_sync` using an appropriate span name (e.g., "build_content_ref_for_attachment"), the backend tracer name (e.g., "backend"), and an extract_attributes callable that captures relevant inputs such as attachment_id and expires_delta_seconds; import trace_sync if missing and ensure the decorator sits immediately above the build_content_ref_for_attachment definition so the whole function is traced.knowledge_runtime/knowledge_runtime/services/handlers.py (1)
17-33: Add tracing decorators to async handler methods.These runtime async handlers should be wrapped with the project tracing decorator for end-to-end observability.
As per coding guidelines, "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 17 - 33, The async handler methods index_document, query, and delete_document_index need to be instrumented with the project tracing decorator: add `@trace_async`(...) above each async def to wrap the entire function; import trace_async if missing and supply a clear span name (e.g., "index_document", "query", "delete_document_index"), the appropriate tracer_name used in the project, and the extract_attributes callable (or a lambda that extracts request fields like knowledge_base_id, document_id, content_ref.kind) so the decorator captures request attributes for observability. Ensure the decorator is applied directly above each method signature in the handlers class and that any required imports are added.knowledge_engine/knowledge_engine/storage/base.py (1)
43-46: Narrow exception type tojson.JSONDecodeError.Catching bare
Exceptioncan mask unexpected errors. Since we're only handling JSON parse failures, use the specific exception:try: data = json.loads(stripped) - except Exception: + except json.JSONDecodeError: return raw_content🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/base.py` around lines 43 - 46, Replace the broad except Exception in the JSON parsing block with a specific json.JSONDecodeError to avoid masking other errors: in the function/method where json.loads(stripped) is called (look for variables data, stripped, raw_content in storage/base.py) catch json.JSONDecodeError and return raw_content as before; ensure the json module is imported so json.JSONDecodeError is available.backend/app/api/endpoints/internal/rag_content.py (1)
48-85: Add OpenTelemetry tracing for observability.Per coding guidelines, async functions should use
@trace_asyncdecorator for OpenTelemetry tracing. This internal endpoint would benefit from tracing for debugging RAG content retrieval issues.+from app.core.tracing import trace_async + `@router.get`("/{attachment_id}") +@trace_async("stream_rag_attachment_content", "rag_content") async def stream_rag_attachment_content(As per coding guidelines, "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag_content.py` around lines 48 - 85, Wrap the async endpoint function stream_rag_attachment_content with the OpenTelemetry decorator `@trace_async` using a descriptive span name (e.g., "stream_rag_attachment_content"), the appropriate tracer name, and an extract_attributes callable that captures key attributes such as attachment_id and context.status; apply this decorator above the function definition (which currently depends on _verify_rag_download_authorization and get_db) so the full function execution (including calls to context_service.get_context_optional and context_service.get_attachment_binary_data and the final Response creation) is traced for observability.docker-compose-rag.yml (1)
10-14: Hardcoded credentials should be replaced with environment variable defaults.While acceptable for local development, these hardcoded passwords could accidentally leak to production. Consider using environment variables with placeholder defaults that force explicit configuration:
environment: - MYSQL_ROOT_PASSWORD: 123456 + MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD:-changeme} MYSQL_DATABASE: task_manager MYSQL_USER: task_user - MYSQL_PASSWORD: task_password + MYSQL_PASSWORD: ${MYSQL_PASSWORD:-changeme}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose-rag.yml` around lines 10 - 14, Replace the hardcoded DB credentials under the environment block by reading from environment variables with safe placeholder defaults so they aren’t baked into the compose file; change the values for MYSQL_ROOT_PASSWORD, MYSQL_DATABASE, MYSQL_USER and MYSQL_PASSWORD to reference env vars (e.g., use ${MYSQL_ROOT_PASSWORD:-<placeholder>} style) and update any documentation or .env.example to require explicit configuration for these variables, ensuring unique symbols MYSQL_ROOT_PASSWORD, MYSQL_DATABASE, MYSQL_USER and MYSQL_PASSWORD are used when locating the entries to modify.knowledge_engine/knowledge_engine/index/indexer.py (1)
45-81: Add concrete types to the new indexing boundary.
DocumentIndexeris a new cross-module service surface, butembed_model,**kwargs,Dict, and bareListkeep most of the contract dynamic. Tightening these signatures now will make the newknowledge_engineAPI much safer to evolve and easier to validate in callers/tests.As per coding guidelines, "Python: Follow PEP 8, Black formatter (line length: 88), isort, type hints required".
Also applies to: 110-115, 162-195
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/index/indexer.py` around lines 45 - 81, DocumentIndexer exposes dynamic types; update the public signatures to use concrete type hints: replace embed_model with a specific interface type (e.g., EmbeddingModel or EmbeddingModelProtocol), change return types from bare Dict/List to dict[str, Any] or list[Document] as appropriate, and make **kwargs explicit (e.g., *, namespace: str | None = None, batch_size: int | None = None) in index_document, index_from_binary and _index_documents to document allowed options; import typing names (dict, list, Any, Optional) and the Document/EmbeddingModel types and apply the same concrete typing changes referenced around the other occurrences (near the noted 110-115 and 162-195 regions) so callers/tests have strict, discoverable contracts.backend/app/services/rag/remote_gateway.py (1)
105-178: Add tracing decorators to gateway operations.The new remote
index/query/deletepaths are currently untraced; this will make runtime-mode debugging and latency attribution harder.As per coding guidelines "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry" and "Useadd_span_event(name, attributes)to add event to current OpenTelemetry span".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/remote_gateway.py` around lines 105 - 178, The three async methods index_document, query, and delete_document_index lack OpenTelemetry tracing; wrap each with the `@trace_async`(...) decorator (use a descriptive span_name like "rag.remote.index_document"/"rag.remote.query"/"rag.remote.delete_document_index" and the appropriate tracer_name and extract_attributes callable) and add span events via add_span_event(...) around key operations (e.g., before/after building payload and before/after calling self._post_model) including attributes such as knowledge_base_id, document_id/document_ref, index_owner_user_id, and user_name to surface payload and latency details for _post_model calls; ensure you import and use trace_async and add_span_event consistently for all three methods (index_document, query, delete_document_index).knowledge_engine/knowledge_engine/services/document_service.py (1)
20-93: Instrument service methods with tracing decorators.Async and sync entrypoints here are not decorated for OpenTelemetry tracing, so this new indexing path will be under-observed in production.
As per coding guidelines "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry" and "Use@trace_sync(span_name, tracer_name, extract_attributes)decorator to trace entire sync functions in OpenTelemetry".Also applies to: 94-196
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/services/document_service.py` around lines 20 - 93, Add OpenTelemetry tracing decorators to the async entrypoints and their sync counterparts: annotate async methods index_document_from_binary, index_document_from_file, delete_document, list_documents (and other async entrypoints in lines 94-196) with `@trace_async` using an appropriate span_name (e.g., "index_document_from_binary", "index_document_from_file", "delete_document", "list_documents"), a consistent tracer_name (e.g., "knowledge_engine.document_service"), and the existing extract_attributes callable; likewise annotate the sync implementations _index_document_from_binary_sync and _index_document_from_file_sync (and other sync helpers) with `@trace_sync` using matching span_name/tracer_name/extract_attributes so the full sync+async call path is captured. Ensure imports for trace_async/trace_sync are added and that the extract_attributes you pass extracts relevant parameters (knowledge_id, user_id, document_id, source_file/file_path) for each span.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 00153d5c-f6e2-4f99-919a-857ee854b2c8
⛔ Files ignored due to path filters (2)
backend/uv.lockis excluded by!**/*.lockknowledge_engine/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (75)
backend/app/api/api.pybackend/app/api/endpoints/internal/__init__.pybackend/app/api/endpoints/internal/rag.pybackend/app/api/endpoints/internal/rag_content.pybackend/app/core/config.pybackend/app/services/auth/__init__.pybackend/app/services/auth/rag_download_token.pybackend/app/services/knowledge/indexing.pybackend/app/services/knowledge/knowledge_service.pybackend/app/services/rag/content_refs.pybackend/app/services/rag/embedding/factory.pybackend/app/services/rag/gateway.pybackend/app/services/rag/gateway_factory.pybackend/app/services/rag/local_data_plane/indexing.pybackend/app/services/rag/local_gateway.pybackend/app/services/rag/remote_gateway.pybackend/app/services/rag/retrieval_service.pybackend/app/services/rag/runtime_resolver.pybackend/app/services/rag/runtime_specs.pybackend/app/services/rag/storage/factory.pybackend/pyproject.tomlbackend/tests/api/endpoints/internal/test_rag_content_endpoint.pybackend/tests/api/endpoints/internal/test_rag_retrieve_endpoint.pybackend/tests/core/test_config.pybackend/tests/integration/test_rag_remote_mode.pybackend/tests/services/knowledge/test_indexing.pybackend/tests/services/rag/test_content_refs.pybackend/tests/services/rag/test_local_data_plane_indexing.pybackend/tests/services/rag/test_local_gateway.pybackend/tests/services/rag/test_remote_gateway.pybackend/tests/services/rag/test_retrieval_service.pybackend/tests/services/rag/test_runtime_resolver.pybackend/tests/services/rag/test_runtime_specs.pybackend/tests/services/test_summary_service.pybackend/tests/tasks/test_knowledge_tasks.pydocker-compose-rag.ymldocker/knowledge_runtime/Dockerfiledocs/plans/2026-04-04-rag-service-extraction-implementation-plan.mddocs/specs/knowledge/2026-04-03-rag-service-extraction-design.mdknowledge_engine/knowledge_engine/__init__.pyknowledge_engine/knowledge_engine/embedding/__init__.pyknowledge_engine/knowledge_engine/index/__init__.pyknowledge_engine/knowledge_engine/index/indexer.pyknowledge_engine/knowledge_engine/query/__init__.pyknowledge_engine/knowledge_engine/services/__init__.pyknowledge_engine/knowledge_engine/services/document_service.pyknowledge_engine/knowledge_engine/splitter/__init__.pyknowledge_engine/knowledge_engine/splitter/config.pyknowledge_engine/knowledge_engine/splitter/factory.pyknowledge_engine/knowledge_engine/splitter/smart.pyknowledge_engine/knowledge_engine/splitter/splitter.pyknowledge_engine/knowledge_engine/storage/__init__.pyknowledge_engine/knowledge_engine/storage/base.pyknowledge_engine/knowledge_engine/storage/chunk_metadata.pyknowledge_engine/pyproject.tomlknowledge_engine/tests/test_document_service.pyknowledge_engine/tests/test_imports.pyknowledge_runtime/knowledge_runtime/__init__.pyknowledge_runtime/knowledge_runtime/api/health.pyknowledge_runtime/knowledge_runtime/api/internal/__init__.pyknowledge_runtime/knowledge_runtime/api/internal/rag.pyknowledge_runtime/knowledge_runtime/api/router.pyknowledge_runtime/knowledge_runtime/core/config.pyknowledge_runtime/knowledge_runtime/main.pyknowledge_runtime/knowledge_runtime/security.pyknowledge_runtime/knowledge_runtime/services/__init__.pyknowledge_runtime/knowledge_runtime/services/content_fetcher.pyknowledge_runtime/knowledge_runtime/services/handlers.pyknowledge_runtime/pyproject.tomlknowledge_runtime/start.shknowledge_runtime/tests/test_health.pyknowledge_runtime/tests/test_internal_rag.pyshared/models/__init__.pyshared/models/knowledge_runtime_protocol.pyshared/tests/test_knowledge_runtime_protocol.py
| async def delete_document_index_local( | ||
| knowledge_base_id: int, | ||
| document_ref: str, | ||
| spec: DeleteRuntimeSpec, | ||
| *, | ||
| db: Session, | ||
| index_owner_user_id: int | None = None, | ||
| ) -> dict: | ||
| """Delete document chunks from the local storage backend.""" | ||
| kb = ( | ||
| db.query(Kind) | ||
| .filter( | ||
| Kind.id == knowledge_base_id, | ||
| Kind.kind == "KnowledgeBase", | ||
| Kind.is_active, | ||
| ) | ||
| .first() | ||
| del db | ||
| storage_backend = create_storage_backend_from_runtime_config(spec.retriever_config) | ||
| service = EngineDocumentService(storage_backend=storage_backend) | ||
| return await service.delete_document( | ||
| knowledge_id=str(spec.knowledge_base_id), | ||
| doc_ref=spec.document_ref, | ||
| user_id=spec.index_owner_user_id, | ||
| ) |
There was a problem hiding this comment.
Local delete currently ignores enabled_index_families.
DeleteRuntimeSpec.enabled_index_families is resolved upstream, but this path never forwards it to the engine service. That means local and remote runtime modes can delete different index families for the same request. Please thread spec.enabled_index_families through the delete call, or remove the field until both modes honor the same contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/local_data_plane/indexing.py` around lines 148 -
161, The local delete implementation in delete_document_index_local ignores
DeleteRuntimeSpec.enabled_index_families; update the call to
EngineDocumentService.delete_document to forward spec.enabled_index_families so
both local and remote modes honor the same contract. Locate
delete_document_index_local and the call to service.delete_document and add the
enabled_index_families (or the exact parameter name expected by
EngineDocumentService.delete_document) argument using the value from
spec.enabled_index_families; ensure any type/None handling matches the engine
method signature.
| self._base_url = (base_url or settings.KNOWLEDGE_RUNTIME_URL).rstrip("/") | ||
| self._token = token if token is not None else settings.INTERNAL_SERVICE_TOKEN | ||
| self._timeout = timeout | ||
|
|
||
| def _build_headers(self) -> dict[str, str]: | ||
| return {"Authorization": f"Bearer {self._token}"} | ||
|
|
There was a problem hiding this comment.
Fail fast when remote token is missing.
If INTERNAL_SERVICE_TOKEN is unset, Line 61 emits Authorization: Bearer None, which turns a config error into opaque auth failures. Validate token in __init__ and raise immediately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/remote_gateway.py` around lines 56 - 62, The
constructor currently assigns self._token from token or
settings.INTERNAL_SERVICE_TOKEN but doesn't validate it, leading _build_headers
to emit "Authorization: Bearer None"; update the __init__ of the RemoteGateway
(or the class containing self._token) to validate that the resolved token is a
non-empty string and raise a clear ValueError (or custom exception) immediately
if missing, and keep _build_headers returning {"Authorization": f"Bearer
{self._token}"} so later code never sees a None token.
| async def _post_model(self, path: str, payload: Any) -> dict[str, Any]: | ||
| async with httpx.AsyncClient(timeout=self._timeout) as client: | ||
| response = await client.post( | ||
| f"{self._base_url}{path}", | ||
| headers=self._build_headers(), | ||
| json=payload.model_dump(mode="json", exclude_none=True), | ||
| ) | ||
|
|
||
| if response.is_error: | ||
| self._raise_remote_error(response) | ||
| return response.json() | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify fallback catches and exception normalization points.
rg -n -C3 'except\s+RemoteRagGatewayError|RemoteQueryResponse\.model_validate\(|httpx\.(RequestError|HTTPError)' backend/app/services/rag backend/app/api/endpoints/internalRepository: wecode-ai/Wegent
Length of output: 1366
🏁 Script executed:
#!/bin/bash
# Retrieve full _post_model method and surrounding context
fd -type f -name 'remote_gateway.py' backend/app/services/ragRepository: wecode-ai/Wegent
Length of output: 230
🏁 Script executed:
#!/bin/bash
# Get line count to determine safe read strategy
wc -l backend/app/services/rag/remote_gateway.pyRepository: wecode-ai/Wegent
Length of output: 106
🏁 Script executed:
#!/bin/bash
cat -n backend/app/services/rag/remote_gateway.pyRepository: wecode-ai/Wegent
Length of output: 7260
🏁 Script executed:
#!/bin/bash
# Search for all callers of query() and _post_model() methods
rg -n '\.query\(|\.\_post_model\(' backend/app --type pyRepository: wecode-ai/Wegent
Length of output: 164
🏁 Script executed:
#!/bin/bash
# Search for all callers of query() and _post_model() methods
rg -n '\.query\(|_post_model\(' backend/app --type pyRepository: wecode-ai/Wegent
Length of output: 50372
🏁 Script executed:
#!/bin/bash
# Search specifically for RemoteRagGateway.query or rag_gateway.query calls
rg -n 'rag_gateway\.query|RemoteRagGateway' backend/app/api/endpoints/internal/rag.py -A 3 -B 3Repository: wecode-ai/Wegent
Length of output: 855
Normalize all remote failures to RemoteRagGatewayError.
Line 64–69 (_post_model) can raise unhandled httpx transport exceptions (e.g., RequestError, ConnectError, TimeoutError), and line 158 (query) can raise ValidationError from schema validation. Both bypass the gateway-level error contract; the caller in backend/app/api/endpoints/internal/rag.py:207 catches only RemoteRagGatewayError to trigger remote→local fallback. These unhandled exceptions break the fallback mechanism and crash the operation instead.
Wrap both sites in try-except and convert to RemoteRagGatewayError:
🔧 Suggested patch
async def _post_model(self, path: str, payload: Any) -> dict[str, Any]:
- async with httpx.AsyncClient(timeout=self._timeout) as client:
- response = await client.post(
- f"{self._base_url}{path}",
- headers=self._build_headers(),
- json=payload.model_dump(mode="json", exclude_none=True),
- )
+ try:
+ async with httpx.AsyncClient(timeout=self._timeout) as client:
+ response = await client.post(
+ f"{self._base_url}{path}",
+ headers=self._build_headers(),
+ json=payload.model_dump(mode="json", exclude_none=True),
+ )
+ except httpx.RequestError as exc:
+ raise RemoteRagGatewayError(
+ f"knowledge_runtime request failed: {exc}",
+ code="remote_request_failed",
+ retryable=True,
+ ) from exc
if response.is_error:
self._raise_remote_error(response)
return response.json()
@@
response_payload = await self._post_model("/internal/rag/query", payload)
- response = RemoteQueryResponse.model_validate(response_payload)
+ try:
+ response = RemoteQueryResponse.model_validate(response_payload)
+ except ValidationError as exc:
+ raise RemoteRagGatewayError(
+ "knowledge_runtime returned invalid query payload",
+ code="remote_response_invalid",
+ status_code=502,
+ details={"validation_errors": exc.errors()},
+ ) from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/remote_gateway.py` around lines 63 - 74, The
_post_model and query code paths must normalize all remote-related exceptions
into RemoteRagGatewayError so the caller can trigger fallback; wrap the body of
async def _post_model(self, path: str, payload: Any) (the httpx.AsyncClient.post
call and response handling) in a try/except that catches httpx.RequestError (and
other httpx transport/errors) and re-raises RemoteRagGatewayError (including the
original exception message), while keeping response.is_error logic that calls
_raise_remote_error; likewise wrap the logic in query (the schema validation and
any downstream remote calls that raise pydantic.ValidationError or httpx errors)
in a try/except that catches ValidationError and httpx transport exceptions and
converts them into RemoteRagGatewayError carrying the original exception
details. Ensure you preserve existing calls to _raise_remote_error for HTTP
response errors and attach original exception info when constructing
RemoteRagGatewayError.
| def _get_model_kind( | ||
| self, | ||
| *, | ||
| db: Session, | ||
| user_id: int, | ||
| model_name: str, | ||
| model_namespace: str, | ||
| ): | ||
| if model_namespace == "default": | ||
| return ( | ||
| db.query(Kind) | ||
| .filter( | ||
| Kind.kind == "Model", | ||
| Kind.name == model_name, | ||
| Kind.namespace == model_namespace, | ||
| Kind.is_active == True, | ||
| ) | ||
| .filter((Kind.user_id == user_id) | (Kind.user_id == 0)) | ||
| .order_by(Kind.user_id.desc()) | ||
| .first() | ||
| ) | ||
| return ( | ||
| db.query(Kind) | ||
| .filter( | ||
| Kind.kind == "Model", | ||
| Kind.name == model_name, | ||
| Kind.namespace == model_namespace, | ||
| Kind.is_active == True, | ||
| ) | ||
| .first() | ||
| ) |
There was a problem hiding this comment.
Scope non-default model lookups by user_id too.
The non-default branch drops the user_id predicate entirely. If two tenants define the same namespace/name, this can resolve the wrong model and forward that tenant’s decrypted API key / headers into the runtime config. Keep the tenant/shared-model filter in both branches.
As per coding guidelines, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".
🧰 Tools
🪛 Ruff (0.15.9)
[error] 357-357: Avoid equality comparisons to True; use Kind.is_active: for truth checks
Replace with Kind.is_active
(E712)
[error] 369-369: Avoid equality comparisons to True; use Kind.is_active: for truth checks
Replace with Kind.is_active
(E712)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/runtime_resolver.py` around lines 342 - 372, The
non-default branch of _get_model_kind currently omits the user_id predicate,
risking cross-tenant lookup; update the second return to include the same
tenant/shared filter used in the default branch: add .filter((Kind.user_id ==
user_id) | (Kind.user_id == 0)) and .order_by(Kind.user_id.desc()) before
.first() so the Kind query always uses namespace, name, and user_id to resolve
the correct Model Kind.
| token = authorization[7:].strip() | ||
| if token != settings.INTERNAL_SERVICE_TOKEN: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_401_UNAUTHORIZED, | ||
| detail="Invalid internal service token", | ||
| ) |
There was a problem hiding this comment.
Guard against empty internal token configuration (auth bypass risk).
If INTERNAL_SERVICE_TOKEN is empty, a blank bearer token can authenticate. Reject startup/runtime auth checks when the configured token is empty.
🔒 Suggested fix
def verify_internal_service_token(authorization: str = Header(default="")) -> None:
+ if not settings.INTERNAL_SERVICE_TOKEN:
+ raise HTTPException(
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+ detail="Internal service token is not configured",
+ )
+
if not authorization.startswith("Bearer "):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Missing or invalid Authorization header",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| token = authorization[7:].strip() | |
| if token != settings.INTERNAL_SERVICE_TOKEN: | |
| raise HTTPException( | |
| status_code=status.HTTP_401_UNAUTHORIZED, | |
| detail="Invalid internal service token", | |
| ) | |
| def verify_internal_service_token(authorization: str = Header(default="")) -> None: | |
| if not settings.INTERNAL_SERVICE_TOKEN: | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail="Internal service token is not configured", | |
| ) | |
| if not authorization.startswith("Bearer "): | |
| raise HTTPException( | |
| status_code=status.HTTP_401_UNAUTHORIZED, | |
| detail="Missing or invalid Authorization header", | |
| ) | |
| token = authorization[7:].strip() | |
| if token != settings.INTERNAL_SERVICE_TOKEN: | |
| raise HTTPException( | |
| status_code=status.HTTP_401_UNAUTHORIZED, | |
| detail="Invalid internal service token", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/security.py` around lines 17 - 22, The
current auth check slices the incoming bearer token into token and compares it
to settings.INTERNAL_SERVICE_TOKEN but does not guard against an empty
configured token; update the code around the token extraction (the token
variable and the comparison to settings.INTERNAL_SERVICE_TOKEN) to first assert
that settings.INTERNAL_SERVICE_TOKEN is non-empty (e.g., if not
settings.INTERNAL_SERVICE_TOKEN: raise an HTTPException indicating
misconfiguration/startup error), and then proceed to compare the incoming token
to that value and raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid internal service token") when they don’t match; this ensures an
empty INTERNAL_SERVICE_TOKEN cannot be used to bypass authentication.
| if content_ref.kind == "backend_attachment_stream": | ||
| headers["Authorization"] = f"Bearer {content_ref.auth_token}" | ||
|
|
||
| async with httpx.AsyncClient() as client: | ||
| response = await client.get(content_ref.url, headers=headers) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Definitions / builders for backend attachment refs:"
rg -n -C4 --type=py 'BackendAttachmentStreamContentRef|backend_attachment_stream'
echo
echo "Any host/origin validation before auth_token is forwarded:"
rg -n -C4 --type=py 'urlsplit|urlparse|netloc|allowlist|trusted.*(host|origin)|BACKEND_.*(URL|HOST)'Repository: wecode-ai/Wegent
Length of output: 50372
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== How knowledge_runtime receives RemoteIndexRequest ==="
rg -n 'RemoteIndexRequest|content_ref' knowledge_runtime/ --type=py -A 3 -B 3 | head -150
echo
echo "=== Knowledge runtime route/endpoint definitions ==="
rg -n 'router|@app|def.*index|RemoteIndexRequest' knowledge_runtime/knowledge_runtime/ --type=py | grep -E '(router|@|def )' | head -40
echo
echo "=== Check if content_ref can be passed as external input ==="
rg -n 'RemoteIndexRequest.*model_validate|from_dict|parse_obj' knowledge_runtime/ --type=py -A 2Repository: wecode-ai/Wegent
Length of output: 8944
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== verify_internal_service_token implementation ==="
rg -n 'verify_internal_service_token|def verify' knowledge_runtime/ --type=py -A 10
echo
echo "=== How does the backend build and send RemoteIndexRequest? ==="
rg -n 'RemoteIndexRequest|backend_attachment_stream' backend/app/ --type=py -B 2 -A 2 | head -60
echo
echo "=== Search for settings.BACKEND_INTERNAL_URL usage ==="
rg -n 'BACKEND_INTERNAL_URL' backend/ --type=py | head -10Repository: wecode-ai/Wegent
Length of output: 5857
Validate the backend origin before forwarding the download token.
The code sends content_ref.auth_token to any URL in content_ref.url without checking that it belongs to the backend. While the builder in backend/app/services/rag/content_refs.py hard-pins the URL to settings.BACKEND_INTERNAL_URL, knowledge_runtime itself has no such validation. If INTERNAL_SERVICE_TOKEN is leaked or an internal caller is compromised, this allows credential exfiltration to an arbitrary host.
Add an origin check in fetch_content() (around line 19) to ensure backend_attachment_stream URLs match the expected backend origin.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
19 - 23, In fetch_content(), avoid unconditionally forwarding
content_ref.auth_token to arbitrary URLs: before setting
headers["Authorization"] when content_ref.kind == "backend_attachment_stream",
parse content_ref.url and verify its origin (scheme + host + optional port)
matches the expected backend origin (use settings.BACKEND_INTERNAL_URL or a
derived origin constant) and only then attach the Authorization header; if the
origin does not match, do not include the token and raise/return an error (or
log and abort) to prevent credential exfiltration. Ensure you reference
content_ref.kind, content_ref.url, content_ref.auth_token and
settings.BACKEND_INTERNAL_URL in the check.
| async def index_document(self, request: RemoteIndexRequest) -> dict: | ||
| content = await content_fetcher.fetch_content(request.content_ref) | ||
| return { | ||
| "status": "accepted", | ||
| "knowledge_id": str(request.knowledge_base_id), | ||
| "document_id": request.document_id, | ||
| "content_bytes": len(content), | ||
| "content_ref_kind": request.content_ref.kind, | ||
| "index_families": request.index_families, | ||
| } | ||
|
|
||
| async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse: | ||
| return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0) | ||
|
|
||
| async def delete_document_index( | ||
| self, request: RemoteDeleteDocumentIndexRequest | ||
| ) -> dict: | ||
| return { | ||
| "status": "accepted", | ||
| "knowledge_id": str(request.knowledge_base_id), | ||
| "doc_ref": request.document_ref, | ||
| "enabled_index_families": request.enabled_index_families, | ||
| } |
There was a problem hiding this comment.
Avoid returning “accepted” from no-op runtime operations.
Line 17 through Line 39 acknowledge index/query/delete without executing engine operations. In remote mode, this can report success while doing no real work.
🛠️ Fail-fast alternative until engine wiring is complete
class RuntimeHandlers:
async def index_document(self, request: RemoteIndexRequest) -> dict:
- content = await content_fetcher.fetch_content(request.content_ref)
- return {
- "status": "accepted",
- "knowledge_id": str(request.knowledge_base_id),
- "document_id": request.document_id,
- "content_bytes": len(content),
- "content_ref_kind": request.content_ref.kind,
- "index_families": request.index_families,
- }
+ raise NotImplementedError("index_document is not wired to knowledge_engine yet")
async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
- return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0)
+ raise NotImplementedError("query is not wired to knowledge_engine yet")
async def delete_document_index(
self, request: RemoteDeleteDocumentIndexRequest
) -> dict:
- return {
- "status": "accepted",
- "knowledge_id": str(request.knowledge_base_id),
- "doc_ref": request.document_ref,
- "enabled_index_families": request.enabled_index_families,
- }
+ raise NotImplementedError("delete_document_index is not wired to knowledge_engine yet")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def index_document(self, request: RemoteIndexRequest) -> dict: | |
| content = await content_fetcher.fetch_content(request.content_ref) | |
| return { | |
| "status": "accepted", | |
| "knowledge_id": str(request.knowledge_base_id), | |
| "document_id": request.document_id, | |
| "content_bytes": len(content), | |
| "content_ref_kind": request.content_ref.kind, | |
| "index_families": request.index_families, | |
| } | |
| async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse: | |
| return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0) | |
| async def delete_document_index( | |
| self, request: RemoteDeleteDocumentIndexRequest | |
| ) -> dict: | |
| return { | |
| "status": "accepted", | |
| "knowledge_id": str(request.knowledge_base_id), | |
| "doc_ref": request.document_ref, | |
| "enabled_index_families": request.enabled_index_families, | |
| } | |
| async def index_document(self, request: RemoteIndexRequest) -> dict: | |
| raise NotImplementedError("index_document is not wired to knowledge_engine yet") | |
| async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse: | |
| raise NotImplementedError("query is not wired to knowledge_engine yet") | |
| async def delete_document_index( | |
| self, request: RemoteDeleteDocumentIndexRequest | |
| ) -> dict: | |
| raise NotImplementedError("delete_document_index is not wired to knowledge_engine yet") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 17 -
39, The current handlers index_document, query, and delete_document_index return
"accepted" without performing engine work; change them to fail-fast while the
engine is unwired by either raising NotImplementedError (with a clear message
containing request.document_id/request.knowledge_base_id where applicable) or
returning an explicit error payload (e.g., status: "not_implemented" and reason
including the request identifiers) from index_document, query
(RemoteQueryResponse should indicate error or raise), and delete_document_index
so we do not report success for no-op operations.
| class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel): | ||
| """Content reference resolved by streaming through Backend.""" | ||
|
|
||
| kind: Literal["backend_attachment_stream"] | ||
| url: str | ||
| auth_token: str | ||
| expires_at: datetime | None = None |
There was a problem hiding this comment.
❓ Verification inconclusive
Script executed:
cat -n shared/models/knowledge_runtime_protocol.py | head -80Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
1 # SPDX-FileCopyrightText: 2025 Weibo, Inc.
2 #
3 # SPDX-License-Identifier: Apache-2.0
4
5 """Lightweight transport models for Backend <-> knowledge_runtime."""
6
7 from __future__ import annotations
8
9 from datetime import datetime
10 from typing import Annotated, Any, Literal
11
12 from pydantic import BaseModel, ConfigDict, Field
13
14
15 class KnowledgeRuntimeProtocolModel(BaseModel):
16 """Base protocol model with strict field validation."""
17
18 model_config = ConfigDict(extra="forbid")
19
20
21 class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel):
22 """Content reference resolved by streaming through Backend."""
23
24 kind: Literal["backend_attachment_stream"]
25 url: str
26 auth_token: str
27 expires_at: datetime | None = None
28
29
30 class PresignedUrlContentRef(KnowledgeRuntimeProtocolModel):
31 """Content reference resolved directly from object storage."""
32
33 kind: Literal["presigned_url"]
34 url: str
35 expires_at: datetime | None = None
36
37
38 ContentRef = Annotated[
39 BackendAttachmentStreamContentRef | PresignedUrlContentRef,
40 Field(discriminator="kind"),
41 ]
42
43 RetrievalPolicy = Literal[
44 "chunk_only",
45 "summary_first",
46 "summary_then_chunk_expand",
47 "hybrid",
48 ]
49
50 RetrievalMode = Literal["vector", "keyword", "hybrid"]
51
52
53 class KnowledgeRuntimeAuth(KnowledgeRuntimeProtocolModel):
54 """Simple internal auth carrier for the runtime service."""
55
56 scheme: Literal["bearer"] = "bearer"
57 token: str
58
59
60 class RemoteRagError(KnowledgeRuntimeProtocolModel):
61 """Standardized remote error payload."""
62
63 code: str
64 message: str
65 retryable: bool = False
66 details: dict[str, Any] | None = None
67
68
69 class RuntimeRetrieverConfig(KnowledgeRuntimeProtocolModel):
70 """Resolved retriever identity and storage configuration."""
71
72 name: str
73 namespace: str = "default"
74 storage_config: dict[str, Any] = Field(default_factory=dict)
75
76
77 class RuntimeEmbeddingModelConfig(KnowledgeRuntimeProtocolModel):
78 """Resolved embedding model configuration."""
79
80 model_name: str
Script executed:
# Check for SecretStr imports or similar masking mechanisms
rg -i "secretstr|secret" shared/models/knowledge_runtime_protocol.pyRepository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 1
Script executed:
# Search for logging configurations that might expose model data
rg -r "logging|logger|log\(" shared/models/knowledge_runtime_protocol.py -A 3Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md:- `logging|logger|log\(`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- `shared/models/__init__.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- `shared/tests/test_knowledge_runtime_protocol.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-
--
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md:- Modify: `logging|logger|log\(`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `shared/models/__init__.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `shared/tests/test_knowledge_runtime_protocol.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `backend/app/services/rag/runtime_specs.py`
Script executed:
# Check for custom __repr__ or __str__ overrides
ast-grep --pattern "def __repr__($$$) { $$$ }" shared/models/knowledge_runtime_protocol.pyRepository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 1
Script executed:
# Search for pydantic config related to serialization
rg "class Config|model_config|json_schema" shared/models/knowledge_runtime_protocol.py -A 5Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
model_config = ConfigDict(extra="forbid")
class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel):
"""Content reference resolved by streaming through Backend."""
--
embedding_model_config: RuntimeEmbeddingModelConfig
retrieval_config: RuntimeRetrievalConfig
class RemoteIndexRequest(KnowledgeRuntimeProtocolModel):
"""Index request sent from Backend to knowledge_runtime."""
--
embedding_model_config: RuntimeEmbeddingModelConfig
splitter_config: dict[str, Any] | None = None
index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
content_ref: ContentRef
trace_context: dict[str, Any] | None = None
user_name: str | None = None
Script executed:
# Find files that import or use these models
rg "BackendAttachmentStreamContentRef|KnowledgeRuntimeAuth" --type py -lRepository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
shared/tests/test_knowledge_runtime_protocol.py
knowledge_runtime/knowledge_runtime/services/content_fetcher.py
shared/models/knowledge_runtime_protocol.py
shared/models/__init__.py
knowledge_runtime/tests/test_internal_rag.py
backend/app/services/rag/content_refs.py
Script executed:
# Search for serialization patterns and HTTP calls using these models
rg "model_dump|json|serialize" shared/models/knowledge_runtime_protocol.py -A 3 -B 3Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 1
Script executed:
# Check if there are any Field() definitions with exclude or other masking
rg "Field\(" shared/models/knowledge_runtime_protocol.pyRepository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
Field(discriminator="kind"),
storage_config: dict[str, Any] = Field(default_factory=dict)
resolved_config: dict[str, Any] = Field(default_factory=dict)
index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
enabled_index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
knowledge_base_configs: list[RemoteKnowledgeBaseQueryConfig] = Field(
enabled_index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
Script executed:
# Look for logging calls in knowledge_runtime related files
find . -type f -name "*.py" -path "*/knowledge*" | xargs rg "logger|logging\..*\(.*auth_token|token\)" 2>/dev/null | head -20Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
./knowledge_runtime/knowledge_runtime/api/internal/rag.py: _: None = Depends(verify_internal_service_token),
./knowledge_runtime/knowledge_runtime/api/internal/rag.py: _: None = Depends(verify_internal_service_token),
./knowledge_runtime/knowledge_runtime/api/internal/rag.py: _: None = Depends(verify_internal_service_token),
./knowledge_engine/knowledge_engine/storage/base.py:logger = logging.getLogger(__name__)
./knowledge_engine/knowledge_engine/index/indexer.py:logger = logging.getLogger(__name__)
./knowledge_engine/knowledge_engine/index/indexer.py: logger.warning(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:logger = logging.getLogger(__name__)
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.debug(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:logger = logging.getLogger(__name__)
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py: logger.debug(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py: logger.info(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py: original_tokens = int(len(content) / chars_per_token)
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py: cleaned_tokens = int(len(cleaned_content) / chars_per_token)
./chat_shell/chat_shell/tools/knowledge_factory.py:logger = logging.getLogger(__name__)
Script executed:
# Check test files for expected behavior
find . -type f -name "*knowledge_runtime*test*.py" -o -name "test_*knowledge_runtime*.py" | head -5Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
./shared/tests/test_knowledge_runtime_protocol.py
Script executed:
# Check how BackendAttachmentStreamContentRef.auth_token is used
cat -n backend/app/services/rag/content_refs.py | grep -A 10 -B 10 "auth_token\|BackendAttachmentStreamContentRef"Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
8
9 from datetime import datetime, timedelta, timezone
10
11 from sqlalchemy.orm import Session
12
13 from app.core.config import settings
14 from app.models.subtask_context import ContextType
15 from app.services.auth.rag_download_token import create_rag_download_token
16 from app.services.context import context_service
17 from shared.models import (
18 BackendAttachmentStreamContentRef,
19 PresignedUrlContentRef,
20 )
21
22 DEFAULT_CONTENT_REF_TTL_SECONDS = 300
23
24
25 def build_content_ref_for_attachment(
26 *,
27 db: Session,
28 attachment_id: int,
29 expires_delta_seconds: int = DEFAULT_CONTENT_REF_TTL_SECONDS,
30 ) -> BackendAttachmentStreamContentRef | PresignedUrlContentRef:
31 """Build a content reference for an attachment used by remote indexing."""
32
33 context = context_service.get_context_optional(
34 db=db,
35 context_id=attachment_id,
36 )
37 if context is None or context.context_type != ContextType.ATTACHMENT.value:
38 raise ValueError(f"Attachment {attachment_id} not found")
39
40 expires_at = datetime.now(timezone.utc) + timedelta(seconds=expires_delta_seconds)
--
43 context=context,
44 expires=expires_delta_seconds,
45 )
46 if presigned_url:
47 return PresignedUrlContentRef(
48 kind="presigned_url",
49 url=presigned_url,
50 expires_at=expires_at,
51 )
52
53 auth_token = create_rag_download_token(
54 attachment_id=attachment_id,
55 expires_delta_seconds=expires_delta_seconds,
56 )
57 base_url = settings.BACKEND_INTERNAL_URL.rstrip("/")
58 return BackendAttachmentStreamContentRef(
59 kind="backend_attachment_stream",
60 url=f"{base_url}{settings.API_PREFIX}/internal/rag/content/{attachment_id}",
61 auth_token=auth_token,
62 expires_at=expires_at,
63 )
Script executed:
# Check knowledge_runtime content_fetcher usage
cat -n knowledge_runtime/knowledge_runtime/services/content_fetcher.py | head -100Repository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
1 # SPDX-FileCopyrightText: 2026 Weibo, Inc.
2 #
3 # SPDX-License-Identifier: Apache-2.0
4
5 from __future__ import annotations
6
7 import httpx
8
9 from shared.models.knowledge_runtime_protocol import (
10 BackendAttachmentStreamContentRef,
11 PresignedUrlContentRef,
12 )
13
14
15 async def fetch_content(
16 content_ref: BackendAttachmentStreamContentRef | PresignedUrlContentRef,
17 ) -> bytes:
18 headers = {}
19 if content_ref.kind == "backend_attachment_stream":
20 headers["Authorization"] = f"Bearer {content_ref.auth_token}"
21
22 async with httpx.AsyncClient() as client:
23 response = await client.get(content_ref.url, headers=headers)
24 response.raise_for_status()
25 return response.content
Script executed:
# Check test file to see expected behavior
cat -n shared/tests/test_knowledge_runtime_protocol.pyRepository: wecode-ai/Wegent
Repository: wecode-ai/Wegent
Exit code: 0
stdout:
1 # SPDX-FileCopyrightText: 2025 Weibo, Inc.
2 #
3 # SPDX-License-Identifier: Apache-2.0
4
5 from __future__ import annotations
6
7 import pytest
8 from pydantic import ValidationError
9
10 import shared.models as shared_models
11
12
13 def _require_model(name: str):
14 model = getattr(shared_models, name, None)
15 if model is None:
16 pytest.fail(f"shared.models must export {name}")
17 return model
18
19
20 def test_shared_models_exports_knowledge_runtime_protocol_types() -> None:
21 exported_names = [
22 "BackendAttachmentStreamContentRef",
23 "PresignedUrlContentRef",
24 "RuntimeEmbeddingModelConfig",
25 "RuntimeRetrievalConfig",
26 "RuntimeRetrieverConfig",
27 "RemoteKnowledgeBaseQueryConfig",
28 "RemoteDeleteDocumentIndexRequest",
29 "RemoteIndexRequest",
30 "RemoteQueryRequest",
31 "RemoteQueryRecord",
32 "RemoteQueryResponse",
33 ]
34
35 for name in exported_names:
36 assert getattr(shared_models, name, None) is not None
37
38
39 def test_remote_index_request_accepts_backend_attachment_stream_content_ref() -> None:
40 remote_index_request = _require_model("RemoteIndexRequest")
41
42 request = remote_index_request.model_validate(
43 {
44 "knowledge_base_id": 11,
45 "document_id": 22,
46 "index_owner_user_id": 33,
47 "retriever_config": {
48 "name": "retriever-a",
49 "namespace": "default",
50 "storage_config": {
51 "type": "milvus",
52 "url": "http://milvus:19530",
53 },
54 },
55 "embedding_model_config": {
56 "model_name": "text-embedding-3-large",
57 "model_namespace": "default",
58 "resolved_config": {"protocol": "openai"},
59 },
60 "index_families": ["chunk_vector"],
61 "content_ref": {
62 "kind": "backend_attachment_stream",
63 "url": "http://backend:8000/api/internal/rag/content/22",
64 "auth_token": "test-token",
65 },
66 }
67 )
68
69 assert request.content_ref.kind == "backend_attachment_stream"
70 assert request.content_ref.auth_token == "test-token"
71
72
73 def test_remote_index_request_rejects_unknown_content_ref_kind() -> None:
74 remote_index_request = _require_model("RemoteIndexRequest")
75
76 with pytest.raises(ValidationError):
77 remote_index_request.model_validate(
78 {
79 "knowledge_base_id": 11,
80 "document_id": 22,
81 "index_owner_user_id": 33,
82 "retriever_config": {"provider": "milvus"},
83 "embedding_model_config": {"model": "text-embedding-3-large"},
84 "index_families": ["chunk_vector"],
85 "content_ref": {
86 "kind": "unsupported_kind",
87 "url": "http://backend:8000/api/internal/rag/content/22",
88 },
89 }
90 )
91
92
93 def test_remote_query_response_preserves_index_family_per_record() -> None:
94 remote_query_response = _require_model("RemoteQueryResponse")
95
96 response = remote_query_response.model_validate(
97 {
98 "records": [
99 {
100 "content": "Chunk A",
101 "title": "Doc A",
102 "score": 0.91,
103 "knowledge_base_id": 1001,
104 "index_family": "chunk_vector",
105 },
106 {
107 "content": "Summary B",
108 "title": "Doc B",
109 "score": 0.88,
110 "knowledge_base_id": 1002,
111 "index_family": "summary_vector",
112 },
113 ],
114 "total": 2,
115 }
116 )
117
118 assert [record.index_family for record in response.records] == [
119 "chunk_vector",
120 "summary_vector",
121 ]
122
123
124 def test_remote_query_request_accepts_explicit_execution_configs() -> None:
125 remote_query_request = _require_model("RemoteQueryRequest")
126
127 request = remote_query_request.model_validate(
128 {
129 "knowledge_base_ids": [1001],
130 "query": "release checklist",
131 "max_results": 6,
132 "knowledge_base_configs": [
133 {
134 "knowledge_base_id": 1001,
135 "index_owner_user_id": 42,
136 "retriever_config": {
137 "name": "retriever-a",
138 "namespace": "default",
139 "storage_config": {
140 "type": "qdrant",
141 "url": "http://qdrant:6333",
142 "indexStrategy": {"mode": "per_dataset"},
143 },
144 },
145 "embedding_model_config": {
146 "model_name": "embed-a",
147 "model_namespace": "default",
148 "resolved_config": {
149 "protocol": "openai",
150 "model_id": "text-embedding-3-small",
151 "base_url": "https://api.openai.com/v1",
152 },
153 },
154 "retrieval_config": {
155 "top_k": 8,
156 "score_threshold": 0.55,
157 "retrieval_mode": "hybrid",
158 "vector_weight": 0.8,
159 "keyword_weight": 0.2,
160 },
161 }
162 ],
163 "enabled_index_families": ["chunk_vector", "summary_vector_index"],
164 "retrieval_policy": "summary_then_chunk_expand",
165 }
166 )
167
168 assert (
169 request.knowledge_base_configs[0].retriever_config.storage_config["type"]
170 == "qdrant"
171 )
172 assert request.knowledge_base_configs[0].retrieval_config.top_k == 8
173 assert request.enabled_index_families == [
174 "chunk_vector",
175 "summary_vector_index",
176 ]
177
178
179 def test_remote_delete_request_requires_resolved_retriever_config() -> None:
180 remote_delete_request = _require_model("RemoteDeleteDocumentIndexRequest")
181
182 request = remote_delete_request.model_validate(
183 {
184 "knowledge_base_id": 101,
185 "document_ref": "202",
186 "index_owner_user_id": 303,
187 "retriever_config": {
188 "name": "retriever-a",
189 "namespace": "default",
190 "storage_config": {
191 "type": "elasticsearch",
192 "url": "http://es:9200",
193 "indexStrategy": {"mode": "per_user", "prefix": "wegent"},
194 },
195 },
196 "enabled_index_families": ["chunk_vector"],
197 }
198 )
199
200 assert request.retriever_config.name == "retriever-a"
201 assert request.retriever_config.storage_config["type"] == "elasticsearch"
Mask transport-layer secrets in protocol models to prevent accidental exposure in logs and repr output.
BackendAttachmentStreamContentRef.auth_token (line 26) and KnowledgeRuntimeAuth.token (line 57) are plain str fields, so Pydantic's default serialization and __repr__ will expose live bearer/download tokens. Switch these to SecretStr or use field exclusion to prevent token leakage in request logs and debugging output. Verify that outbound HTTP calls still send the raw token value (via field access or custom serializer).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, The auth
token fields currently stored as plain
strings—BackendAttachmentStreamContentRef.auth_token and
KnowledgeRuntimeAuth.token—must be changed to Pydantic-safe secret types or
excluded from repr/serialization to avoid leaking credentials; update those
fields to use pydantic.SecretStr (or annotate with Field(repr=False,
exclude=True)) in the KnowledgeRuntimeProtocolModel subclasses and adjust any
code that sends tokens (HTTP clients, headers builders) to call
.get_secret_value() (or the equivalent accessor) so outbound requests still send
the raw token value. Ensure model serialization/parse logic preserves
compatibility (e.g., accept plain strings on input) and verify unit tests or
callers that access token value are updated to use the SecretStr accessor.
599fe6b to
4b1dcc6
Compare
There was a problem hiding this comment.
Actionable comments posted: 20
♻️ Duplicate comments (5)
docker/knowledge_runtime/Dockerfile (1)
5-29:⚠️ Potential issue | 🟠 MajorDrop root before starting the service.
The container still runs as root, which increases breakout impact for a network-facing process. Create a dedicated user after the install steps and switch to it before
CMD, making sure/app,/shared, and/knowledge_engineremain readable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/knowledge_runtime/Dockerfile` around lines 5 - 29, Create a non-root runtime user after the install steps in the Dockerfile (after RUN uv pip install ...) and ensure /app, /shared, and /knowledge_engine are owned/readable by that user; add a user/group (e.g., appuser), chown those directories to the new user, and switch to it with the USER instruction before the existing CMD that runs uvicorn (the CMD invoking "uvicorn knowledge_runtime.main:app ..."). Ensure permissions allow the service to read/write as needed while the image no longer runs as root.knowledge_runtime/knowledge_runtime/core/config.py (1)
15-15:⚠️ Potential issue | 🟠 MajorMake
INTERNAL_SERVICE_TOKENnon-default.A fixed fallback token weakens the internal auth boundary and is easy to miss in non-development deployments. Require the env var instead of silently accepting
"knowledge-runtime-token".Suggested fix
- INTERNAL_SERVICE_TOKEN: str = "knowledge-runtime-token" + INTERNAL_SERVICE_TOKEN: str🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/core/config.py` at line 15, Replace the hardcoded fallback string for the module-level constant INTERNAL_SERVICE_TOKEN with a required environment lookup and fail fast if it's missing; specifically, stop assigning "knowledge-runtime-token" and instead read os.environ["INTERNAL_SERVICE_TOKEN"] (or use a helper that raises) and raise a clear RuntimeError/ValueError during import if the env var is not set so the service cannot start with a default token; update any initialization/tests that assumed the default to provide the env var.backend/app/services/rag/runtime_resolver.py (2)
425-447:⚠️ Potential issue | 🔴 CriticalKeep non-default model lookups tenant-scoped too.
The
model_namespace != "default"branch drops theuser_id/ shared-model filter and ordering, so the first active model with that namespace/name wins across tenants. That can attach another tenant's model config and decrypted credentials to the runtime payload.Based on learnings, "Kind resource is uniquely identified by THREE fields:
namespace,name, anduser_id- always use all three fields to locate a Kind".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_resolver.py` around lines 425 - 447, The non-default branch in runtime_resolver.py currently queries Kind without scoping by user_id; update the second db.query(Kind) block (the branch where model_namespace != "default") to include the same tenant-scoping and ordering as the default branch: filter on (Kind.user_id == user_id) | (Kind.user_id == 0) and order_by(Kind.user_id.desc()) so the lookup uses the three identifying fields (Kind.namespace, Kind.name, Kind.user_id) and picks tenant-owned over shared before calling .first(); keep the existing filters for Kind.kind, Kind.name, Kind.namespace, and Kind.is_active.
318-332:⚠️ Potential issue | 🔴 CriticalRestore tenant-scoped knowledge-base lookup.
_get_knowledge_base_record()still resolves any activeKnowledgeBaseby id only, and the new query/public/delete builders all depend on it. That lets one tenant drive runtime-config resolution from another tenant's KB. Thread requester identity through this helper and use the tenant-aware lookup path instead.Based on learnings, "Kind resource is uniquely identified by THREE fields:
namespace,name, anduser_id- always use all three fields to locate a Kind".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_resolver.py` around lines 318 - 332, The helper _get_knowledge_base_record currently queries by Kind.id only; change it to be tenant-scoped by accepting the requester/tenant identity (e.g., add params for namespace and user_id or a requester object) and update the query to filter on the three unique Kind identifiers (Kind.namespace, Kind.name, Kind.user_id) in addition to Kind.kind == "KnowledgeBase" and Kind.is_active, then update all callers of _get_knowledge_base_record to thread the requester/tenant info so the lookup uses the tenant-aware path rather than resolving by id alone.shared/models/knowledge_runtime_protocol.py (1)
21-27:⚠️ Potential issue | 🟠 MajorMask transport tokens in these protocol models.
auth_tokenandtokenare plainstr, sorepr(), validation errors, ormodel_dump()can expose live bearer/download tokens. PreferSecretStrhere and update the header builders to read the raw secret explicitly.🛠️ Possible fix
-from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, SecretStr @@ - auth_token: str + auth_token: SecretStr = Field(repr=False) @@ - token: str + token: SecretStr = Field(repr=False)Call sites that build
Authorizationheaders will then need.get_secret_value().Also applies to: 53-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, Change plain-string token fields to secure types: replace the auth_token: str in BackendAttachmentStreamContentRef and any other models that define token: str (see the other model around the 53-57 block) with SecretStr so secrets aren't exposed via repr()/model_dump()/validation errors, and update any header-building code that uses these models to call .get_secret_value() to obtain the raw token (e.g., where Authorization or download headers are constructed). Ensure imports for SecretStr are added and all call sites retrieving the token are updated accordingly.
🧹 Nitpick comments (17)
backend/app/services/rag/retrieval/filters.py (1)
12-12: Consider sorting__all__for consistency.Static analysis flags that
__all__is not sorted. This is a minor style issue.🔧 Suggested fix
-__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"] +__all__ = ["build_elasticsearch_filters", "parse_metadata_filters"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/retrieval/filters.py` at line 12, The module-level export list __all__ is unsorted; please reorder it alphabetically (e.g., ["build_elasticsearch_filters", "parse_metadata_filters"]) so __all__ is deterministic and satisfies static analysis; update the __all__ assignment near the top of the module (where __all__ is defined) to the sorted list while leaving the functions parse_metadata_filters and build_elasticsearch_filters untouched.backend/app/api/endpoints/adapter/retrievers.py (1)
244-244: Prefer underscore prefix over explicitdelfor unused parameters.Using
del current_useris unconventional. If the parameter is only needed for authentication but not used in the function body, consider prefixing with underscore or removing the check from type hints while keeping Depends.🔧 Suggested alternative
`@router.post`("/test-connection") async def test_retriever_connection( test_data: dict, - current_user: User = Depends(security.get_current_user), + _current_user: User = Depends(security.get_current_user), ): ... _check_rag_available() - del current_user🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/adapter/retrievers.py` at line 244, The explicit "del current_user" line is unconventional; instead rename the unused dependency parameter to start with an underscore (e.g., _current_user) in the function signature that currently contains "del current_user" (keep the Depends(...) so authentication still runs) and remove the "del current_user" statement; alternatively, if you prefer to keep the name, leave the parameter but prefix it with an underscore in that same function's signature to signal it's intentionally unused (update any type hints accordingly).knowledge_engine/tests/test_storage_factory.py (1)
39-46: Verify thattypekey exclusion is intentional.The assertion expects the config to not include the
"type"key (which was"fake"in input). This is correct if the factory strips backend type before passing to the constructor. However, if this is the expected behavior, consider adding an explicit comment or separate assertion to document this transformation.💡 Suggested enhancement for clarity
assert isinstance(backend, FakeBackend) + # Factory should strip 'type' key and normalize config before passing to backend assert captured["config"] == { "url": "http://vector-store:1234",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/tests/test_storage_factory.py` around lines 39 - 46, The test currently asserts that captured["config"] lacks the "type" key but doesn't document whether that omission is intentional; update the test in test_storage_factory (the test that sets up captured and asserts captured["config"]) to explicitly verify and document the transformation by either adding an assertion that the original input contained "type": "fake" and that the factory removed it, or by adding a concise comment explaining that StorageFactory strips the backend "type" before passing config to the constructor (reference the captured variable and the factory/create method used in the test).backend/app/services/rag/storage/elasticsearch_backend.py (1)
12-12: Sort__all__alphabetically.Per static analysis (RUF022), the exports should be sorted. This is a minor style consistency fix.
✏️ Suggested fix
-__all__ = ["ElasticsearchBackend", "Elasticsearch"] +__all__ = ["Elasticsearch", "ElasticsearchBackend"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/storage/elasticsearch_backend.py` at line 12, The __all__ export list is not alphabetized; update the module-level __all__ (currently ["ElasticsearchBackend", "Elasticsearch"]) to be sorted alphabetically so the exports read ["Elasticsearch", "ElasticsearchBackend"] to satisfy RUF022 and consistent style.knowledge_engine/knowledge_engine/embedding/factory.py (1)
30-80: Consider splitting_create_embedding_model_from_resolved_valuesinto smaller helpers.This function is just over the preferred limit and now mixes protocol routing plus object construction. Extracting protocol-specific builders will keep it easier to evolve safely.
As per coding guidelines, "Function length: Max 50 lines per function (preferred)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/embedding/factory.py` around lines 30 - 80, The function _create_embedding_model_from_resolved_values is doing protocol routing and object construction; refactor by extracting protocol-specific builders: implement a helper _build_openai_embedding(protocol, model_id, api_key, base_url, custom_headers, dimensions) that returns either CustomEmbedding or OpenAIEmbedding for protocol "openai", and another helper _build_custom_protocol_embedding(protocol, model_name, model_id, api_key, base_url, custom_headers, dimensions) that validates base_url and returns CustomEmbedding for protocols in {"cohere","jina","custom"}; update _create_embedding_model_from_resolved_values to call those helpers for the corresponding protocol branches and keep the final unsupported-protocol ValueError in place.knowledge_engine/knowledge_engine/retrieval/filters.py (1)
20-98: Refactorparse_metadata_filtersinto smaller helpers.
parse_metadata_filterscurrently exceeds the preferred function-size guideline. Splitting operator-map/constants and condition parsing will improve maintainability.As per coding guidelines, "Function length: Max 50 lines per function (preferred)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/retrieval/filters.py` around lines 20 - 98, Split parse_metadata_filters into smaller helpers: move the OPERATOR_MAP constant out of parse_metadata_filters to a module-level constant (e.g., OPERATOR_MAP) and create a helper function to build the filters list (e.g., _build_filters_from_conditions(knowledge_id, metadata_condition)) that contains the loop creating ExactMatchFilter, MetadataFilter and mapping operators, plus a small helper (e.g., _get_condition_type(metadata_condition)) to determine the "and"/"or" string; then simplify parse_metadata_filters to call these helpers and return MetadataFilters(filters=..., condition=...), keeping the existing semantics and using ExactMatchFilter, MetadataFilter and MetadataFilters as before.backend/tests/services/rag/test_runtime_specs.py (1)
143-154: Single-case parameterized test could be simplified.The parameterized test now has only one test case. Consider converting it to a regular test function for clarity, or document if more cases are planned.
-@pytest.mark.parametrize( - ("kwargs", "expected_message"), - [ - ( - {"source_type": "attachment"}, - "Field required", - ), - ], -) -def test_index_source_enforces_coherent_shape(kwargs, expected_message): - with pytest.raises(ValidationError, match=expected_message): - IndexSource(**kwargs) +def test_index_source_requires_attachment_id(): + with pytest.raises(ValidationError, match="Field required"): + IndexSource(source_type="attachment")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/services/rag/test_runtime_specs.py` around lines 143 - 154, The test test_index_source_enforces_coherent_shape currently uses pytest.mark.parametrize with a single case; simplify by removing the `@pytest.mark.parametrize` decorator and directly call IndexSource(**{"source_type": "attachment"}) inside the with pytest.raises(ValidationError, match="Field required") block (or alternatively expand to more cases if you intend to add them), keeping the test name and the assertion on IndexSource the same.knowledge_engine/knowledge_engine/storage/factory.py (1)
28-76: Consider extracting duplicated validation and config-building logic.Both
create_storage_backend_from_configandcreate_storage_backend_from_runtime_configduplicate the validation check and config dictionary construction. Extract shared logic to reduce duplication.♻️ Suggested refactor
+def _validate_storage_type(storage_type: str) -> str: + normalized = storage_type.lower() + if normalized not in STORAGE_BACKEND_REGISTRY: + raise ValueError( + f"Unsupported storage type: {normalized or '<missing>'}. " + f"Supported types: {list(STORAGE_BACKEND_REGISTRY.keys())}" + ) + return normalized + + +def _build_backend_config( + url: Optional[str], + username: Optional[str], + password: Optional[str], + api_key: Optional[str], + index_strategy: Optional[Dict[str, Any]], + ext: Optional[Dict[str, Any]], +) -> Dict[str, Any]: + return { + "url": url, + "username": username, + "password": password, + "apiKey": api_key, + "indexStrategy": index_strategy or {"mode": "per_dataset"}, + "ext": ext or {}, + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/factory.py` around lines 28 - 76, Both functions create_storage_backend_from_config and create_storage_backend_from_runtime_config duplicate the same validation against STORAGE_BACKEND_REGISTRY and construction of the config dict; refactor by extracting a small helper (e.g., _validate_and_build_storage_config or _get_storage_backend_class_and_config) that accepts the raw inputs (storage_type and a mapping or explicit params) and returns the validated backend_class and the normalized config dict (with keys "url","username","password","apiKey","indexStrategy","ext"); then replace the bodies of create_storage_backend_from_config and create_storage_backend_from_runtime_config to call that helper and simply return backend_class(config). Ensure the helper handles lowercasing the type, raises the same ValueError message when missing/unsupported, and applies defaults for indexStrategy and ext.backend/app/api/endpoints/internal/rag_content.py (1)
79-85: Loading full attachment content into memory may cause OOM for large files.Using
Response(content=binary_data)loads the entire attachment into memory. For large files, consider usingStreamingResponsewith chunked reading to avoid memory exhaustion.♻️ Suggested improvement using StreamingResponse
+from fastapi.responses import StreamingResponse +import io - return Response( - content=binary_data, - media_type=context.mime_type, - headers={ - "Content-Disposition": _build_content_disposition(context.original_filename) - }, - ) + return StreamingResponse( + io.BytesIO(binary_data), + media_type=context.mime_type, + headers={ + "Content-Disposition": _build_content_disposition(context.original_filename) + }, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/internal/rag_content.py` around lines 79 - 85, The current return uses Response(content=binary_data) which loads the entire file into memory; change this to use StreamingResponse (from starlette.responses) and stream the attachment in chunks instead of passing binary_data directly. Implement a generator/iterable that reads the file-like source in chunks (or wraps bytes in io.BytesIO and yields chunks) and pass that generator to StreamingResponse with media_type=context.mime_type and headers={"Content-Disposition": _build_content_disposition(context.original_filename)}; update the return in the same place where Response, binary_data, context.mime_type, and _build_content_disposition are referenced.knowledge_runtime/knowledge_runtime/api/internal/rag.py (1)
20-49: Consider adding response models for consistency.The
queryendpoint usesresponse_model=RemoteQueryResponse, but other endpoints return genericdict. Adding response models improves API documentation and type safety.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py` around lines 20 - 49, The endpoints index_document, delete_document_index, and test_connection return generic dicts while query uses response_model=RemoteQueryResponse; update each route decorator to include the appropriate response_model (e.g., response_model=RemoteIndexResponse for index_document, response_model=RemoteDeleteDocumentIndexResponse for delete_document_index, and response_model=RemoteTestConnectionResponse for test_connection) and change the function return type annotations to those concrete response model classes; ensure you import the model classes and keep the handlers calls (runtime_handlers.index_document, runtime_handlers.delete_document_index, runtime_handlers.test_connection) unchanged so they return/serialize to the specified models for consistent docs and typing.backend/app/api/endpoints/rag.py (1)
107-108: Consider logging the remote gateway error before falling back.The fallback to
LocalRagGatewaysilently swallows theRemoteRagGatewayError. Adding a log statement helps with debugging and monitoring fallback frequency.♻️ Suggested improvement
+import logging + +logger = logging.getLogger(__name__) + try: result = await gateway.query(runtime_spec, db=db) except RemoteRagGatewayError: + logger.warning("Remote RAG gateway failed, falling back to local gateway") result = await LocalRagGateway().query(runtime_spec, db=db)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/endpoints/rag.py` around lines 107 - 108, When catching RemoteRagGatewayError in the endpoint, log the caught exception before falling back to LocalRagGateway: update the except RemoteRagGatewayError block to record the error (e.g., logger.error or logger.exception) including the exception details and context (runtime_spec or other relevant identifiers) and then call await LocalRagGateway().query(runtime_spec, db=db); reference RemoteRagGatewayError, LocalRagGateway, query, runtime_spec and db to locate the change.backend/app/services/rag/runtime_specs.py (1)
5-17: Reuse the sharedRetrievalPolicyalias here.This redeclares the same literal union that already exists in
shared.models. Keeping two copies makes local/remote mode drift likely the next time a policy is added or renamed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_specs.py` around lines 5 - 17, The local redeclaration of RetrievalPolicy should be removed and the alias imported from shared.models to avoid drift; replace the Literal[...] block with an import of RetrievalPolicy from shared.models (same module that defines RemoteKnowledgeBaseQueryConfig, RuntimeEmbeddingModelConfig, RuntimeRetrievalConfig, RuntimeRetrieverConfig) and update any local references to use that imported RetrievalPolicy symbol.knowledge_runtime/knowledge_runtime/services/handlers.py (1)
28-110: Instrument the new remote handler entrypoints.
index_document(),query(),delete_document_index(), andtest_connection()are the main runtime boundaries, but none of them open a top-level span or attach request identifiers. That will make remote-mode failures much harder to correlate.As per coding guidelines, "Use
@trace_async(span_name, tracer_name, extract_attributes)decorator to trace entire async functions in OpenTelemetry".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 28 - 110, The four remote handler methods (index_document, query, delete_document_index, test_connection) lack OpenTelemetry tracing; wrap each async method with the `@trace_async` decorator (using an appropriate span_name and tracer_name) and provide an extract_attributes callable that pulls request identifiers and key context (e.g., request.request_id or request.id, request.knowledge_base_id/knowledge_base_configs items, request.index_owner_user_id, document_id/document_ref, query text or retrieval_config identifiers) so every top-level entrypoint opens a span and attaches those attributes for correlation; ensure the decorator is imported/available and apply it to each method signature.knowledge_runtime/tests/test_internal_rag.py (1)
218-227: Add a success-path/internal/rag/querytest.The only query coverage here is the 401 branch. The new aggregation path still has no regression coverage for record merging, sorting, truncation, or
document_id/index_familyshaping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/tests/test_internal_rag.py` around lines 218 - 227, Add a new success-path test in knowledge_runtime/tests/test_internal_rag.py that uses TestClient(create_app()) to POST to "/internal/rag/query" with a valid Authorization header and a payload similar to the existing test; seed or mock the underlying retrieval/aggregation layer (or use test fixtures) so the aggregation path runs and assert response.status_code == 200 and response.json() contains the expected merged records; specifically verify that records are merged correctly across sources, sorted by relevance, truncated to any configured max_results, and that each returned item includes the shaped fields "document_id" and "index_family" (use the same helper/mocking utilities as other tests to locate functions interacting with the retrieval/aggregation pipeline and validate merging, sorting, and truncation behavior).knowledge_engine/knowledge_engine/storage/milvus_backend.py (2)
59-61: Use immutable type annotation to avoid mutable class attribute default.The class attribute
_milvusclient_confighas a mutable default{}. While it's reassigned in__init__, this pattern can cause subtle bugs if subclasses don't callsuper().__init__(). Declare it as a type annotation only.♻️ Proposed fix
- # Store config for lazy async client creation - _milvusclient_config: Dict[str, Any] = {} + # Store config for lazy async client creation (initialized in __init__) + _milvusclient_config: Dict[str, Any]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py` around lines 59 - 61, The class attribute _milvusclient_config is defined with a mutable default {} which can lead to shared-state bugs; change it to a type-only annotation (e.g., _milvusclient_config: Dict[str, Any]) without assigning {} in the class body and ensure the instance-specific dict is created in __init__ (or where the lazy async client configuration is set) so subclasses that omit super().__init__ won't accidentally share state.
720-726: Consider logging client close failures for debugging.The
try-except-passpattern when closing the client silently swallows any errors. While unlikely, connection close failures can indicate resource issues. A debug log would help with troubleshooting.♻️ Proposed fix
finally: # Ensure client is closed to avoid connection leaks if client: try: client.close() - except Exception: - pass + except Exception as e: + logger.debug(f"[Milvus] Error closing client: {e}")The same pattern appears in
test_connection(lines 743-749) andget_all_chunks(lines 823-829).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py` around lines 720 - 726, The current client.close() calls in the finally blocks of the methods use a try-except-pass pattern that silently ignores exceptions. To improve debugging, update these blocks in the methods containing client.close() (including the finally blocks in the current method, test_connection, and get_all_chunks) to catch exceptions and log the failure using a debug or error log instead of passing silently. This will help surface connection close failures without interrupting normal operation.knowledge_engine/knowledge_engine/storage/qdrant_backend.py (1)
563-570: Moveloggingimport to module level.The
loggingmodule is imported inside the exception handler, which is unconventional. Import it at the module level alongside other imports and create a module-level logger for consistency with other backend implementations.♻️ Proposed fix
Add at the top of the file (after line 19):
import loggingAdd after imports (around line 35):
logger = logging.getLogger(__name__)Then update the exception handler:
except Exception as e: # Log error but return empty list to allow fallback to RAG - import logging - - logger = logging.getLogger(__name__) logger.warning( f"[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}" ) return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/qdrant_backend.py` around lines 563 - 570, The logging import is done inside an exception handler; move "import logging" to the module level and create a module-level logger (e.g., logger = logging.getLogger(__name__)) near the other imports so it's consistent with other backends, then update the except block that currently references a locally-imported logger to use the module-level logger (the block that logs "[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/endpoints/rag.py`:
- Around line 113-116: The except blocks in rag.py currently re-raise
HTTPException without chaining the original exceptions, losing the original
tracebacks; update both exception handlers in the function containing these
blocks to use exception chaining (e.g., raise HTTPException(status_code=400,
detail=str(e)) from e and raise HTTPException(status_code=500, detail=str(e))
from e) so the original traceback is preserved for debugging—locate the
ValueError and generic Exception handlers (the except ValueError as e and except
Exception as e blocks) and add "from e" to each raised HTTPException.
In `@backend/app/core/config.py`:
- Around line 238-267: Update the parse_rag_runtime_mode validator to enforce
only "local" or "remote" values: when v is None return "local"; when v is a
Mapping parse keys/values to lowercased strings but validate each resulting
value is exactly "local" or "remote" and raise ValueError for any invalid entry;
when v is a string trim and, if JSON, decode to a Mapping and validate as above,
otherwise ensure the final lowercased string equals "local" or "remote" and
raise ValueError if not; apply the same strict validation logic to the other
validator referenced around lines 543-550 so both parse/field_validator paths
only accept "local" or "remote" and fail-fast on misconfiguration.
In `@backend/app/services/rag/local_data_plane/administration.py`:
- Around line 15-26: The new async executor test_connection_local lacks
OpenTelemetry tracing; add the `@trace_async` decorator to the async function (use
trace_async with a clear span name like "administration.test_connection_local"
and the appropriate tracer name, e.g., "backend") and supply an
extract_attributes callable that records useful inputs (e.g.,
spec.retriever_config or other non-sensitive attributes) so the whole async
boundary is traced; update the function definition for test_connection_local to
be decorated with trace_async accordingly.
In `@backend/app/services/rag/local_data_plane/indexing.py`:
- Around line 164-180: The helper functions _build_index_storage_backend and
_build_index_embed_model are missing return type annotations; update their
signatures to include appropriate return types (e.g., the concrete storage
backend type returned by
create_storage_backend/create_storage_backend_from_runtime_config and the embed
model type returned by whatever create_embed_model/_embed_model_factory returns)
so static typing is satisfied; locate the functions named
_build_index_storage_backend and _build_index_embed_model in indexing.py and add
explicit return type hints matching the actual classes/interfaces they return
(or use a suitable Protocol/Union/typing.Any if multiple types are possible),
and update any imports if necessary to reference those types.
In `@backend/app/services/rag/retrieval/__init__.py`:
- Line 12: The __all__ export list in the module is unsorted which triggers Ruff
RUF022; update the __all__ declaration to list the exported names in sorted
order, e.g., ensure "__all__ = ['build_elasticsearch_filters',
'parse_metadata_filters']" so the symbol names (build_elasticsearch_filters and
parse_metadata_filters) are alphabetized.
In `@backend/tests/api/endpoints/test_rag_api.py`:
- Around line 66-76: The test uses patch(..., create=True) when mocking
build_public_query_runtime_spec which allows silent creation of the attribute;
remove the create=True argument from the patch call that targets
build_public_query_runtime_spec so the patch validates the real import path
(i.e., update the patch invocation for
"app.api.endpoints.rag.runtime_resolver.build_public_query_runtime_spec" to omit
create=True), keeping the rest of the mock setup (return_value=runtime_spec and
the surrounding with(...) context) unchanged.
In `@backend/tests/api/endpoints/test_retrievers_api.py`:
- Around line 71-75: The test asserts an inconsistent validation message: the
request includes storage_type but the expected response lists both storage_type
and url as missing. Update the assertion in
backend/tests/api/endpoints/test_retrievers_api.py to match the actual request
payload—either change the expected message to "Missing required fields: url"
when storage_type is provided, or modify the request to omit storage_type if you
intend to assert both are missing; update the response.json() expected dict
accordingly (reference the response variable and the failing assertion block).
In `@docker-compose.yml`:
- Around line 79-84: Remove the predictable fallback for INTERNAL_SERVICE_TOKEN
and stop exposing the service port to the host by default: require the env var
INTERNAL_SERVICE_TOKEN to be set (do not provide a default value) and remove or
comment out the host port binding "${KNOWLEDGE_RUNTIME_PORT:-8200}:8200" so the
container only listens on its internal port (keep PORT: 8200 and HOST: 0.0.0.0
if needed for internal network access), and ensure similar changes are applied
to the other occurrence of the same mapping and env (lines referenced 133-134).
Update README or compose docs to instruct operators to explicitly set
INTERNAL_SERVICE_TOKEN and host port mapping when they intentionally want host
exposure.
In `@docker/knowledge_runtime/Dockerfile`:
- Around line 13-18: The Dockerfile currently copies pyproject.toml and runs "uv
pip install --system --no-cache -r pyproject.toml", which bypasses UV's project
management; instead copy the full project (COPY knowledge_runtime /app) and
install via UV from that directory (e.g., RUN uv pip install /app or the UV "uv
install" workflow) so lockfiles and reproducible installs are respected—update
the RUN line that references "uv pip install --system --no-cache -r
pyproject.toml" to the idiomatic "uv pip install /app" (or "uv install") and
ensure pyproject.toml remains inside /app before running the command.
In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 72-73: The current code assumes
response.json()["data"][0]["embedding"] always exists; add defensive validation
after response.raise_for_status(): parse body = response.json(), verify body is
a dict, that "data" is present and is a non-empty list, and that body["data"][0]
contains "embedding"; if any check fails, raise a clear ValueError (or custom
exception) including a descriptive message and the raw response text/JSON to aid
debugging instead of letting a KeyError/IndexError surface; ensure the code
still assigns embedding when validations pass (the lines around
response.raise_for_status(), response.json(), and the embedding variable are the
places to update).
- Around line 61-77: The retry decorator on _call_api is currently
unconditional; change it to only retry transient network/HTTP exceptions by
adding a retry= parameter (e.g.,
retry=retry_if_exception_type((requests.exceptions.ConnectionError,
requests.exceptions.Timeout, requests.exceptions.HTTPError))) so
JSONDecodeError/KeyError are not retried; update the `@retry`(...) invocation that
decorates _call_api to include that retry predicate and import the required
tenacity helper(s) and requests exception types.
- Around line 58-59: The class defines async _aget_query_embedding but lacks the
required async _aget_text_embedding needed by BaseEmbedding; implement an async
def _aget_text_embedding(self, text: str) -> list[float] that delegates to the
synchronous _get_text_embedding via asyncio.to_thread (mirroring
_aget_query_embedding) so async callers (e.g., aget_text_embedding_batch) work
correctly and avoid runtime errors.
In `@knowledge_engine/knowledge_engine/embedding/factory.py`:
- Around line 40-62: The OpenAI branch in factory.py currently hardcodes
"text-embedding-3-small" when model_id is falsy; change the fallback to prefer
runtime_config.model_name before the hardcoded default. Update both the
CustomEmbedding and OpenAIEmbedding invocations (the places constructing
model=... / model=... or model=... param) to use model_id or
runtime_config.model_name or "text-embedding-3-small". Ensure runtime_config is
referenced/imported where needed and use the same fallback logic in the
CustomEmbedding and OpenAIEmbedding construction sites.
In `@knowledge_engine/knowledge_engine/query/executor.py`:
- Around line 53-57: The retrieval_setting values are taken directly from config
using config.get(..., default) which doesn’t replace keys that exist with None;
update the logic that builds retrieval_setting (the retrieval_setting dict in
executor.py) to normalize None entries first by treating None the same as
missing: for each key ("top_k", "score_threshold", "retrieval_mode") read
config.get(key) and if the returned value is None substitute the intended
default (20, 0.7, "vector") before assigning into retrieval_setting so
downstream code never receives None for these fields.
In `@knowledge_engine/knowledge_engine/retrieval/filters.py`:
- Around line 118-142: The operator variable is compared against lowercase
literal branches but isn't normalized, so mixed-case operators like "GTE" are
ignored; update the loop that processes metadata_condition["conditions"] to
normalize operator to lowercase (e.g., replace the current operator =
cond.get("operator", "eq") with obtaining it and calling .lower() safely) before
the series of if/elif checks (the block that appends filters for "eq", "ne",
"in", "nin", "gt", "gte", "lt", "lte", "contains", "text_match"); ensure you
guard against None when calling .lower() so existing checks for key and value
remain correct.
In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py`:
- Around line 468-498: list_documents() currently calls Elasticsearch.search
unconditionally which raises when the index doesn't exist; update list_documents
(use index_name and es_client created via Elasticsearch(self.url,
**self.es_kwargs)) to first check for the index's existence (eg.
es_client.indices.exists(index=index_name)) or catch the Elasticsearch
NotFoundError around the es_client.search(...) call and return an empty
paginated result when the index is missing so the function returns an empty page
instead of raising.
In `@knowledge_engine/knowledge_engine/storage/factory.py`:
- Around line 56-58: The function create_storage_backend_from_runtime_config
lacks a return type hint; update its signature to include the concrete storage
interface (for example -> StorageBackend) and import that type from the module
that defines the storage backend interface (e.g., the StorageBackend or
BaseStorage class in your storage backends module). Ensure the annotation
matches the actual object returned by create_storage_backend_from_runtime_config
(or use a Union of concrete backend classes if it can return multiple types).
In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 148-156: The RemoteQueryRecord construction currently hard-codes
index_family="chunk_vector", which mislabels non-chunk results; update where
RemoteQueryRecord is created (the RemoteQueryRecord(...) call) to derive
index_family from the engine result/record (e.g., a field on record that denotes
the family) instead of using the literal "chunk_vector", and if the
engine/record does not supply a supported index family, raise or reject the
request (or explicitly handle mapping from engine types to allowed values) so
enabled_index_families / retrieval_policy remain consistent with results.
- Around line 176-180: The code is using the URL basename
(parsed_url/PurePosixPath(file_name)) as a source filename which yields
meaningless attachment IDs for backend_attachment_stream URLs; change the branch
in the function that handles request.content_ref.url so that after extracting
file_name you detect and reject non-meaningful names (e.g., numeric IDs or paths
matching "/content/{attachment_id}" or where file_name has no extension) and
instead use the provided fallback/source name (e.g., request.content_ref.name or
the function's fallback parameter) as the source_file; still compute suffix from
a real filename when present, otherwise set suffix to an empty string or infer
from fallback, and return the chosen source_file and suffix (update any
references to parsed_url, file_name, PurePosixPath to reflect this logic).
In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 142-150: The model currently defaults knowledge_base_configs to an
empty list (Field(default_factory=list)), allowing malformed remote queries to
validate; change this so requests fail fast by making knowledge_base_configs
required (remove the default) or keep the field but add validation: implement a
Pydantic validator (e.g., `@root_validator` or
`@validator`('knowledge_base_configs')) in the same model to ensure
knowledge_base_configs is non-empty and that its entries
(RemoteKnowledgeBaseQueryConfig) correspond to knowledge_base_ids (lengths match
and each config's id or identifier matches an entry in knowledge_base_ids),
raising a ValueError if the check fails.
---
Duplicate comments:
In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 425-447: The non-default branch in runtime_resolver.py currently
queries Kind without scoping by user_id; update the second db.query(Kind) block
(the branch where model_namespace != "default") to include the same
tenant-scoping and ordering as the default branch: filter on (Kind.user_id ==
user_id) | (Kind.user_id == 0) and order_by(Kind.user_id.desc()) so the lookup
uses the three identifying fields (Kind.namespace, Kind.name, Kind.user_id) and
picks tenant-owned over shared before calling .first(); keep the existing
filters for Kind.kind, Kind.name, Kind.namespace, and Kind.is_active.
- Around line 318-332: The helper _get_knowledge_base_record currently queries
by Kind.id only; change it to be tenant-scoped by accepting the requester/tenant
identity (e.g., add params for namespace and user_id or a requester object) and
update the query to filter on the three unique Kind identifiers (Kind.namespace,
Kind.name, Kind.user_id) in addition to Kind.kind == "KnowledgeBase" and
Kind.is_active, then update all callers of _get_knowledge_base_record to thread
the requester/tenant info so the lookup uses the tenant-aware path rather than
resolving by id alone.
In `@docker/knowledge_runtime/Dockerfile`:
- Around line 5-29: Create a non-root runtime user after the install steps in
the Dockerfile (after RUN uv pip install ...) and ensure /app, /shared, and
/knowledge_engine are owned/readable by that user; add a user/group (e.g.,
appuser), chown those directories to the new user, and switch to it with the
USER instruction before the existing CMD that runs uvicorn (the CMD invoking
"uvicorn knowledge_runtime.main:app ..."). Ensure permissions allow the service
to read/write as needed while the image no longer runs as root.
In `@knowledge_runtime/knowledge_runtime/core/config.py`:
- Line 15: Replace the hardcoded fallback string for the module-level constant
INTERNAL_SERVICE_TOKEN with a required environment lookup and fail fast if it's
missing; specifically, stop assigning "knowledge-runtime-token" and instead read
os.environ["INTERNAL_SERVICE_TOKEN"] (or use a helper that raises) and raise a
clear RuntimeError/ValueError during import if the env var is not set so the
service cannot start with a default token; update any initialization/tests that
assumed the default to provide the env var.
In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 21-27: Change plain-string token fields to secure types: replace
the auth_token: str in BackendAttachmentStreamContentRef and any other models
that define token: str (see the other model around the 53-57 block) with
SecretStr so secrets aren't exposed via repr()/model_dump()/validation errors,
and update any header-building code that uses these models to call
.get_secret_value() to obtain the raw token (e.g., where Authorization or
download headers are constructed). Ensure imports for SecretStr are added and
all call sites retrieving the token are updated accordingly.
---
Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Line 244: The explicit "del current_user" line is unconventional; instead
rename the unused dependency parameter to start with an underscore (e.g.,
_current_user) in the function signature that currently contains "del
current_user" (keep the Depends(...) so authentication still runs) and remove
the "del current_user" statement; alternatively, if you prefer to keep the name,
leave the parameter but prefix it with an underscore in that same function's
signature to signal it's intentionally unused (update any type hints
accordingly).
In `@backend/app/api/endpoints/internal/rag_content.py`:
- Around line 79-85: The current return uses Response(content=binary_data) which
loads the entire file into memory; change this to use StreamingResponse (from
starlette.responses) and stream the attachment in chunks instead of passing
binary_data directly. Implement a generator/iterable that reads the file-like
source in chunks (or wraps bytes in io.BytesIO and yields chunks) and pass that
generator to StreamingResponse with media_type=context.mime_type and
headers={"Content-Disposition":
_build_content_disposition(context.original_filename)}; update the return in the
same place where Response, binary_data, context.mime_type, and
_build_content_disposition are referenced.
In `@backend/app/api/endpoints/rag.py`:
- Around line 107-108: When catching RemoteRagGatewayError in the endpoint, log
the caught exception before falling back to LocalRagGateway: update the except
RemoteRagGatewayError block to record the error (e.g., logger.error or
logger.exception) including the exception details and context (runtime_spec or
other relevant identifiers) and then call await
LocalRagGateway().query(runtime_spec, db=db); reference RemoteRagGatewayError,
LocalRagGateway, query, runtime_spec and db to locate the change.
In `@backend/app/services/rag/retrieval/filters.py`:
- Line 12: The module-level export list __all__ is unsorted; please reorder it
alphabetically (e.g., ["build_elasticsearch_filters", "parse_metadata_filters"])
so __all__ is deterministic and satisfies static analysis; update the __all__
assignment near the top of the module (where __all__ is defined) to the sorted
list while leaving the functions parse_metadata_filters and
build_elasticsearch_filters untouched.
In `@backend/app/services/rag/runtime_specs.py`:
- Around line 5-17: The local redeclaration of RetrievalPolicy should be removed
and the alias imported from shared.models to avoid drift; replace the
Literal[...] block with an import of RetrievalPolicy from shared.models (same
module that defines RemoteKnowledgeBaseQueryConfig, RuntimeEmbeddingModelConfig,
RuntimeRetrievalConfig, RuntimeRetrieverConfig) and update any local references
to use that imported RetrievalPolicy symbol.
In `@backend/app/services/rag/storage/elasticsearch_backend.py`:
- Line 12: The __all__ export list is not alphabetized; update the module-level
__all__ (currently ["ElasticsearchBackend", "Elasticsearch"]) to be sorted
alphabetically so the exports read ["Elasticsearch", "ElasticsearchBackend"] to
satisfy RUF022 and consistent style.
In `@backend/tests/services/rag/test_runtime_specs.py`:
- Around line 143-154: The test test_index_source_enforces_coherent_shape
currently uses pytest.mark.parametrize with a single case; simplify by removing
the `@pytest.mark.parametrize` decorator and directly call
IndexSource(**{"source_type": "attachment"}) inside the with
pytest.raises(ValidationError, match="Field required") block (or alternatively
expand to more cases if you intend to add them), keeping the test name and the
assertion on IndexSource the same.
In `@knowledge_engine/knowledge_engine/embedding/factory.py`:
- Around line 30-80: The function _create_embedding_model_from_resolved_values
is doing protocol routing and object construction; refactor by extracting
protocol-specific builders: implement a helper _build_openai_embedding(protocol,
model_id, api_key, base_url, custom_headers, dimensions) that returns either
CustomEmbedding or OpenAIEmbedding for protocol "openai", and another helper
_build_custom_protocol_embedding(protocol, model_name, model_id, api_key,
base_url, custom_headers, dimensions) that validates base_url and returns
CustomEmbedding for protocols in {"cohere","jina","custom"}; update
_create_embedding_model_from_resolved_values to call those helpers for the
corresponding protocol branches and keep the final unsupported-protocol
ValueError in place.
In `@knowledge_engine/knowledge_engine/retrieval/filters.py`:
- Around line 20-98: Split parse_metadata_filters into smaller helpers: move the
OPERATOR_MAP constant out of parse_metadata_filters to a module-level constant
(e.g., OPERATOR_MAP) and create a helper function to build the filters list
(e.g., _build_filters_from_conditions(knowledge_id, metadata_condition)) that
contains the loop creating ExactMatchFilter, MetadataFilter and mapping
operators, plus a small helper (e.g., _get_condition_type(metadata_condition))
to determine the "and"/"or" string; then simplify parse_metadata_filters to call
these helpers and return MetadataFilters(filters=..., condition=...), keeping
the existing semantics and using ExactMatchFilter, MetadataFilter and
MetadataFilters as before.
In `@knowledge_engine/knowledge_engine/storage/factory.py`:
- Around line 28-76: Both functions create_storage_backend_from_config and
create_storage_backend_from_runtime_config duplicate the same validation against
STORAGE_BACKEND_REGISTRY and construction of the config dict; refactor by
extracting a small helper (e.g., _validate_and_build_storage_config or
_get_storage_backend_class_and_config) that accepts the raw inputs (storage_type
and a mapping or explicit params) and returns the validated backend_class and
the normalized config dict (with keys
"url","username","password","apiKey","indexStrategy","ext"); then replace the
bodies of create_storage_backend_from_config and
create_storage_backend_from_runtime_config to call that helper and simply return
backend_class(config). Ensure the helper handles lowercasing the type, raises
the same ValueError message when missing/unsupported, and applies defaults for
indexStrategy and ext.
In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py`:
- Around line 59-61: The class attribute _milvusclient_config is defined with a
mutable default {} which can lead to shared-state bugs; change it to a type-only
annotation (e.g., _milvusclient_config: Dict[str, Any]) without assigning {} in
the class body and ensure the instance-specific dict is created in __init__ (or
where the lazy async client configuration is set) so subclasses that omit
super().__init__ won't accidentally share state.
- Around line 720-726: The current client.close() calls in the finally blocks of
the methods use a try-except-pass pattern that silently ignores exceptions. To
improve debugging, update these blocks in the methods containing client.close()
(including the finally blocks in the current method, test_connection, and
get_all_chunks) to catch exceptions and log the failure using a debug or error
log instead of passing silently. This will help surface connection close
failures without interrupting normal operation.
In `@knowledge_engine/knowledge_engine/storage/qdrant_backend.py`:
- Around line 563-570: The logging import is done inside an exception handler;
move "import logging" to the module level and create a module-level logger
(e.g., logger = logging.getLogger(__name__)) near the other imports so it's
consistent with other backends, then update the except block that currently
references a locally-imported logger to use the module-level logger (the block
that logs "[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}").
In `@knowledge_engine/tests/test_storage_factory.py`:
- Around line 39-46: The test currently asserts that captured["config"] lacks
the "type" key but doesn't document whether that omission is intentional; update
the test in test_storage_factory (the test that sets up captured and asserts
captured["config"]) to explicitly verify and document the transformation by
either adding an assertion that the original input contained "type": "fake" and
that the factory removed it, or by adding a concise comment explaining that
StorageFactory strips the backend "type" before passing config to the
constructor (reference the captured variable and the factory/create method used
in the test).
In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py`:
- Around line 20-49: The endpoints index_document, delete_document_index, and
test_connection return generic dicts while query uses
response_model=RemoteQueryResponse; update each route decorator to include the
appropriate response_model (e.g., response_model=RemoteIndexResponse for
index_document, response_model=RemoteDeleteDocumentIndexResponse for
delete_document_index, and response_model=RemoteTestConnectionResponse for
test_connection) and change the function return type annotations to those
concrete response model classes; ensure you import the model classes and keep
the handlers calls (runtime_handlers.index_document,
runtime_handlers.delete_document_index, runtime_handlers.test_connection)
unchanged so they return/serialize to the specified models for consistent docs
and typing.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 28-110: The four remote handler methods (index_document, query,
delete_document_index, test_connection) lack OpenTelemetry tracing; wrap each
async method with the `@trace_async` decorator (using an appropriate span_name and
tracer_name) and provide an extract_attributes callable that pulls request
identifiers and key context (e.g., request.request_id or request.id,
request.knowledge_base_id/knowledge_base_configs items,
request.index_owner_user_id, document_id/document_ref, query text or
retrieval_config identifiers) so every top-level entrypoint opens a span and
attaches those attributes for correlation; ensure the decorator is
imported/available and apply it to each method signature.
In `@knowledge_runtime/tests/test_internal_rag.py`:
- Around line 218-227: Add a new success-path test in
knowledge_runtime/tests/test_internal_rag.py that uses TestClient(create_app())
to POST to "/internal/rag/query" with a valid Authorization header and a payload
similar to the existing test; seed or mock the underlying retrieval/aggregation
layer (or use test fixtures) so the aggregation path runs and assert
response.status_code == 200 and response.json() contains the expected merged
records; specifically verify that records are merged correctly across sources,
sorted by relevance, truncated to any configured max_results, and that each
returned item includes the shaped fields "document_id" and "index_family" (use
the same helper/mocking utilities as other tests to locate functions interacting
with the retrieval/aggregation pipeline and validate merging, sorting, and
truncation behavior).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| except ValueError as e: | ||
| raise HTTPException(status_code=400, detail=str(e)) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Chain exceptions to preserve traceback for debugging.
Per the static analysis hint, exceptions raised within except clauses should use raise ... from err to preserve the original traceback, which aids debugging.
🔧 Proposed fix
- except ValueError as e:
- raise HTTPException(status_code=400, detail=str(e))
+ except ValueError as e:
+ raise HTTPException(status_code=400, detail=str(e)) from e
except Exception as e:
- raise HTTPException(status_code=500, detail=str(e))
+ raise HTTPException(status_code=500, detail=str(e)) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except ValueError as e: | |
| raise HTTPException(status_code=400, detail=str(e)) | |
| except Exception as e: | |
| raise HTTPException(status_code=500, detail=str(e)) | |
| except ValueError as e: | |
| raise HTTPException(status_code=400, detail=str(e)) from e | |
| except Exception as e: | |
| raise HTTPException(status_code=500, detail=str(e)) from e |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 114-114: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 115-115: Do not catch blind exception: Exception
(BLE001)
[warning] 116-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/endpoints/rag.py` around lines 113 - 116, The except blocks
in rag.py currently re-raise HTTPException without chaining the original
exceptions, losing the original tracebacks; update both exception handlers in
the function containing these blocks to use exception chaining (e.g., raise
HTTPException(status_code=400, detail=str(e)) from e and raise
HTTPException(status_code=500, detail=str(e)) from e) so the original traceback
is preserved for debugging—locate the ValueError and generic Exception handlers
(the except ValueError as e and except Exception as e blocks) and add "from e"
to each raised HTTPException.
| @field_validator("RAG_RUNTIME_MODE", mode="before") | ||
| @classmethod | ||
| def parse_rag_runtime_mode(cls, v: Any) -> str | dict[str, str]: | ||
| """Parse RAG runtime mode from a global value or JSON operation map.""" | ||
| if v is None: | ||
| return "local" | ||
| if isinstance(v, Mapping): | ||
| return { | ||
| str(key).strip().lower(): str(value).strip().lower() | ||
| for key, value in v.items() | ||
| if str(key).strip() and str(value).strip() | ||
| } | ||
| if isinstance(v, str): | ||
| raw = v.strip() | ||
| if not raw: | ||
| return "local" | ||
| if raw.startswith("{"): | ||
| try: | ||
| parsed = json.loads(raw) | ||
| except json.JSONDecodeError: | ||
| return raw.lower() | ||
| if isinstance(parsed, Mapping): | ||
| return { | ||
| str(key).strip().lower(): str(value).strip().lower() | ||
| for key, value in parsed.items() | ||
| if str(key).strip() and str(value).strip() | ||
| } | ||
| return raw.lower() | ||
| return v | ||
|
|
There was a problem hiding this comment.
Validate RAG_RUNTIME_MODE values (local/remote) during parsing.
Line 446 documents only two valid modes, but Lines 238-267 and Lines 543-550 currently accept arbitrary strings (e.g., "remtoe"). That can silently misroute operations instead of failing fast on misconfiguration.
Suggested fix
+ALLOWED_RAG_RUNTIME_MODES = {"local", "remote"}
+
+def _normalize_rag_mode(value: Any) -> str:
+ mode = str(value).strip().lower()
+ if mode not in ALLOWED_RAG_RUNTIME_MODES:
+ raise ValueError(
+ f"Invalid RAG runtime mode: {value}. "
+ f"Expected one of: {sorted(ALLOWED_RAG_RUNTIME_MODES)}"
+ )
+ return mode
+
`@field_validator`("RAG_RUNTIME_MODE", mode="before")
`@classmethod`
def parse_rag_runtime_mode(cls, v: Any) -> str | dict[str, str]:
"""Parse RAG runtime mode from a global value or JSON operation map."""
if v is None:
return "local"
if isinstance(v, Mapping):
- return {
- str(key).strip().lower(): str(value).strip().lower()
- for key, value in v.items()
- if str(key).strip() and str(value).strip()
- }
+ normalized: dict[str, str] = {}
+ for key, value in v.items():
+ key_s = str(key).strip().lower()
+ if not key_s:
+ continue
+ normalized[key_s] = _normalize_rag_mode(value)
+ return normalized
if isinstance(v, str):
raw = v.strip()
if not raw:
return "local"
if raw.startswith("{"):
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
- return raw.lower()
+ return _normalize_rag_mode(raw)
if isinstance(parsed, Mapping):
- return {
- str(key).strip().lower(): str(value).strip().lower()
- for key, value in parsed.items()
- if str(key).strip() and str(value).strip()
- }
- return raw.lower()
+ normalized: dict[str, str] = {}
+ for key, value in parsed.items():
+ key_s = str(key).strip().lower()
+ if not key_s:
+ continue
+ normalized[key_s] = _normalize_rag_mode(value)
+ return normalized
+ return _normalize_rag_mode(raw)
return vAlso applies to: 543-550
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/core/config.py` around lines 238 - 267, Update the
parse_rag_runtime_mode validator to enforce only "local" or "remote" values:
when v is None return "local"; when v is a Mapping parse keys/values to
lowercased strings but validate each resulting value is exactly "local" or
"remote" and raise ValueError for any invalid entry; when v is a string trim
and, if JSON, decode to a Mapping and validate as above, otherwise ensure the
final lowercased string equals "local" or "remote" and raise ValueError if not;
apply the same strict validation logic to the other validator referenced around
lines 543-550 so both parse/field_validator paths only accept "local" or
"remote" and fail-fast on misconfiguration.
| async def test_connection_local( | ||
| spec: ConnectionTestRuntimeSpec, | ||
| *, | ||
| db: Session | None = None, | ||
| ) -> dict: | ||
| del db | ||
| storage_backend = create_storage_backend_from_runtime_config(spec.retriever_config) | ||
| success = await asyncio.to_thread(storage_backend.test_connection) | ||
| return { | ||
| "success": success, | ||
| "message": "Connection successful" if success else "Connection failed", | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add OpenTelemetry tracing to this new async executor.
test_connection_local is a new async boundary and should be wrapped with @trace_async(...) for consistent observability.
📈 Proposed fix
from app.services.rag.runtime_specs import ConnectionTestRuntimeSpec
from app.services.rag.storage.factory import create_storage_backend_from_runtime_config
+from shared.telemetry.decorators import trace_async
@@
+@trace_async(
+ span_name="rag.test_connection_local",
+ tracer_name="backend.services.rag",
+)
async def test_connection_local(
spec: ConnectionTestRuntimeSpec,
*,
db: Session | None = None,
-) -> dict:
+) -> dict[str, bool | str]:As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/local_data_plane/administration.py` around lines 15
- 26, The new async executor test_connection_local lacks OpenTelemetry tracing;
add the `@trace_async` decorator to the async function (use trace_async with a
clear span name like "administration.test_connection_local" and the appropriate
tracer name, e.g., "backend") and supply an extract_attributes callable that
records useful inputs (e.g., spec.retriever_config or other non-sensitive
attributes) so the whole async boundary is traced; update the function
definition for test_connection_local to be decorated with trace_async
accordingly.
| def _build_index_storage_backend( | ||
| spec: IndexRuntimeSpec, | ||
| *, | ||
| db: Session, | ||
| ): | ||
| if spec.retriever_config is not None: | ||
| return create_storage_backend_from_runtime_config(spec.retriever_config) | ||
|
|
||
| retriever = retriever_kinds_service.get_retriever( | ||
| db=db, | ||
| user_id=runtime_user_id, | ||
| name=retriever_name, | ||
| namespace=retriever_namespace, | ||
| user_id=spec.index_owner_user_id, | ||
| name=spec.retriever_name, | ||
| namespace=spec.retriever_namespace, | ||
| ) | ||
| if retriever is None: | ||
| logger.warning( | ||
| "Retriever %s not found for KB %s during delete-index cleanup", | ||
| retriever_name, | ||
| knowledge_base_id, | ||
| ) | ||
| return { | ||
| "status": "skipped", | ||
| "reason": "retriever_not_found", | ||
| "knowledge_id": str(knowledge_base_id), | ||
| "doc_ref": document_ref, | ||
| } | ||
| raise ValueError("retriever_not_found") | ||
| return create_storage_backend(retriever) |
There was a problem hiding this comment.
Add return type annotations to helper functions.
Per coding guidelines, type hints are required. The helper functions _build_index_storage_backend and _build_index_embed_model are missing return type annotations.
🔧 Proposed fix
+from app.services.rag.storage.base import BaseStorageBackend
+from llama_index.core.base.embeddings.base import BaseEmbedding
def _build_index_storage_backend(
spec: IndexRuntimeSpec,
*,
db: Session,
-):
+) -> BaseStorageBackend: def _build_index_embed_model(
spec: IndexRuntimeSpec,
*,
db: Session,
-):
+) -> BaseEmbedding:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/local_data_plane/indexing.py` around lines 164 -
180, The helper functions _build_index_storage_backend and
_build_index_embed_model are missing return type annotations; update their
signatures to include appropriate return types (e.g., the concrete storage
backend type returned by
create_storage_backend/create_storage_backend_from_runtime_config and the embed
model type returned by whatever create_embed_model/_embed_model_factory returns)
so static typing is satisfied; locate the functions named
_build_index_storage_backend and _build_index_embed_model in indexing.py and add
explicit return type hints matching the actual classes/interfaces they return
(or use a suitable Protocol/Union/typing.Any if multiple types are possible),
and update any imports if necessary to reference those types.
| "parse_metadata_filters", | ||
| "build_elasticsearch_filters", | ||
| ] | ||
| __all__ = ["parse_metadata_filters", "build_elasticsearch_filters"] |
There was a problem hiding this comment.
Sort __all__ to satisfy Ruff RUF022.
Current ordering is not isort-style and will keep triggering lint noise.
🧹 Proposed fix
-__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"]
+__all__ = ["build_elasticsearch_filters", "parse_metadata_filters"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = ["parse_metadata_filters", "build_elasticsearch_filters"] | |
| __all__ = ["build_elasticsearch_filters", "parse_metadata_filters"] |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 12-12: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/retrieval/__init__.py` at line 12, The __all__
export list in the module is unsorted which triggers Ruff RUF022; update the
__all__ declaration to list the exported names in sorted order, e.g., ensure
"__all__ = ['build_elasticsearch_filters', 'parse_metadata_filters']" so the
symbol names (build_elasticsearch_filters and parse_metadata_filters) are
alphabetized.
| def create_storage_backend_from_runtime_config( | ||
| retriever_config: RuntimeRetrieverConfig, | ||
| ): |
There was a problem hiding this comment.
Add return type annotation.
The function create_storage_backend_from_runtime_config is missing a return type annotation. Per coding guidelines, type hints are required.
def create_storage_backend_from_runtime_config(
retriever_config: RuntimeRetrieverConfig,
-):
+) -> BaseStorageBackend:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def create_storage_backend_from_runtime_config( | |
| retriever_config: RuntimeRetrieverConfig, | |
| ): | |
| def create_storage_backend_from_runtime_config( | |
| retriever_config: RuntimeRetrieverConfig, | |
| ) -> BaseStorageBackend: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_engine/knowledge_engine/storage/factory.py` around lines 56 - 58,
The function create_storage_backend_from_runtime_config lacks a return type
hint; update its signature to include the concrete storage interface (for
example -> StorageBackend) and import that type from the module that defines the
storage backend interface (e.g., the StorageBackend or BaseStorage class in your
storage backends module). Ensure the annotation matches the actual object
returned by create_storage_backend_from_runtime_config (or use a Union of
concrete backend classes if it can return multiple types).
| RemoteQueryRecord( | ||
| content=record.get("content", ""), | ||
| title=record.get("title", "Unknown"), | ||
| score=record.get("score"), | ||
| metadata=record.get("metadata"), | ||
| knowledge_base_id=knowledge_base_id, | ||
| document_id=self._extract_document_id(record.get("metadata")), | ||
| index_family="chunk_vector", | ||
| ) |
There was a problem hiding this comment.
Preserve the actual index family in RemoteQueryRecord.
index_family is hard-coded to "chunk_vector", so remote mode will mislabel summary-family hits and diverge from enabled_index_families / retrieval_policy once non-chunk retrieval is enabled. Derive this from the engine result, or reject non-chunk requests until the runtime can preserve it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 148 -
156, The RemoteQueryRecord construction currently hard-codes
index_family="chunk_vector", which mislabels non-chunk results; update where
RemoteQueryRecord is created (the RemoteQueryRecord(...) call) to derive
index_family from the engine result/record (e.g., a field on record that denotes
the family) instead of using the literal "chunk_vector", and if the
engine/record does not supply a supported index family, raise or reject the
request (or explicitly handle mapping from engine types to allowed values) so
enabled_index_families / retrieval_policy remain consistent with results.
| parsed_url = urlparse(request.content_ref.url) | ||
| file_name = PurePosixPath(parsed_url.path).name | ||
| if file_name: | ||
| suffix = PurePosixPath(file_name).suffix.lower() | ||
| return file_name, suffix |
There was a problem hiding this comment.
Don't treat backend stream URLs as source filenames.
For backend_attachment_stream, the basename of .../content/{attachment_id} is just the attachment id, so this branch stores a meaningless numeric source_file instead of using the fallback name. That leaks poor titles into indexed metadata and downstream query results whenever presigned URLs are unavailable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 176 -
180, The code is using the URL basename (parsed_url/PurePosixPath(file_name)) as
a source filename which yields meaningless attachment IDs for
backend_attachment_stream URLs; change the branch in the function that handles
request.content_ref.url so that after extracting file_name you detect and reject
non-meaningful names (e.g., numeric IDs or paths matching
"/content/{attachment_id}" or where file_name has no extension) and instead use
the provided fallback/source name (e.g., request.content_ref.name or the
function's fallback parameter) as the source_file; still compute suffix from a
real filename when present, otherwise set suffix to an empty string or infer
from fallback, and return the chosen source_file and suffix (update any
references to parsed_url, file_name, PurePosixPath to reflect this logic).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/app/services/rag/storage/__init__.py (1)
7-19: Import ordering note.Per PEP 8 and
isortconvention, imports are typically ordered: standard library → third-party packages → local application imports. Here, local imports (app.services.rag.storage.factory) appear before theknowledge_enginepackage imports. Ifknowledge_engineis treated as a third-party/external package (even if it's a sibling package in this monorepo), consider reordering.However, if the linter/formatter doesn't flag this or if
knowledge_engineis configured as a first-party package in yourisortconfig, this is fine as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/storage/__init__.py` around lines 7 - 19, The imports in storage/__init__.py are out of conventional order; move the knowledge_engine imports (BaseStorageBackend, ChunkMetadata, ElasticsearchBackend, MilvusBackend, QdrantBackend) above the app.services.rag.storage.factory imports (create_storage_backend, create_storage_backend_from_config, create_storage_backend_from_runtime_config, get_all_storage_retrieval_methods, get_supported_retrieval_methods, get_supported_storage_types) so third-party packages come before local application imports, or alternatively mark `knowledge_engine` as first-party in your isort config if that ordering is intentional.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/app/services/rag/storage/__init__.py`:
- Around line 7-19: The imports in storage/__init__.py are out of conventional
order; move the knowledge_engine imports (BaseStorageBackend, ChunkMetadata,
ElasticsearchBackend, MilvusBackend, QdrantBackend) above the
app.services.rag.storage.factory imports (create_storage_backend,
create_storage_backend_from_config, create_storage_backend_from_runtime_config,
get_all_storage_retrieval_methods, get_supported_retrieval_methods,
get_supported_storage_types) so third-party packages come before local
application imports, or alternatively mark `knowledge_engine` as first-party in
your isort config if that ordering is intentional.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 987909b7-915c-4af7-97d2-c08cc7dbb283
📒 Files selected for processing (3)
backend/app/services/rag/embedding/__init__.pybackend/app/services/rag/retrieval/__init__.pybackend/app/services/rag/storage/__init__.py
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (8)
knowledge_engine/knowledge_engine/embedding/custom.py (1)
75-77:⚠️ Potential issue | 🟡 MinorHarden embedding extraction for malformed API payloads.
Direct indexing into nested fields can raise opaque errors and obscure upstream API issues.
Proposed change
response.raise_for_status() - embedding = response.json()["data"][0]["embedding"] + try: + payload = response.json() + embedding = payload["data"][0]["embedding"] + except (KeyError, IndexError, TypeError) as exc: + raise ValueError(f"Malformed embedding API response: {exc}") from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/embedding/custom.py` around lines 75 - 77, The code directly indexes response.json()["data"][0]["embedding"] which can raise opaque KeyError/IndexError/TypeError for malformed API payloads; replace that with a defensive extraction: call payload = response.json(), verify isinstance(payload, dict), ensure "data" in payload and isinstance(payload["data"], list) and len(payload["data"])>0 and isinstance(payload["data"][0], dict) and "embedding" in payload["data"][0], then set embedding = payload["data"][0]["embedding"]; if any check fails raise a clear ValueError (or custom exception) that includes the full payload and a descriptive message so upstream API issues are visible (wrap extraction in try/except to re-raise with payload on unexpected errors).shared/models/knowledge_runtime_protocol.py (1)
21-27:⚠️ Potential issue | 🟠 MajorMask transport-layer tokens in these models.
Line 26 and Line 57 keep live bearer/download tokens as plain
str, sorepr, debug logs, and accidentalmodel_dump()calls can leak credentials. UseSecretStror an excluded field here and unwrap it explicitly only at the transport edge (for exampleknowledge_runtime/knowledge_runtime/services/content_fetcher.py:20and any header builders). Please verify the callers still send the raw token value after the change.Also applies to: 53-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, The auth_token fields in BackendAttachmentStreamContentRef and the similar download model should not be plain str (they risk leaking in repr/model_dump); change their type to pydantic SecretStr (or an excluded field with exclude=True in the model config) and update any code that reads the token to explicitly unwrap it at the transport edge (e.g., in knowledge_runtime/services/content_fetcher.py and any header builder functions) so callers still provide the raw token but it is only revealed when creating request headers; ensure model serialization excludes the secret by default.backend/app/services/rag/remote_gateway.py (2)
53-65:⚠️ Potential issue | 🟠 MajorReject a blank internal token in
__init__.
app.core.config.Settings.INTERNAL_SERVICE_TOKENdefaults to an empty string, so Lines 61-65 currently buildAuthorization: Bearerand turn a startup misconfig into opaque 401s. Validateself._tokenonce in the constructor and fail fast.Suggested fix
self._base_url = (base_url or settings.KNOWLEDGE_RUNTIME_URL).rstrip("/") self._token = token if token is not None else settings.INTERNAL_SERVICE_TOKEN + if not self._token: + raise ValueError( + "INTERNAL_SERVICE_TOKEN must be configured for RemoteRagGateway" + ) self._timeout = timeout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/remote_gateway.py` around lines 53 - 65, The constructor __init__ of RemoteGateway currently allows an empty INTERNAL_SERVICE_TOKEN and _build_headers then emits an Authorization: Bearer header causing opaque 401s; validate that the resolved token (self._token) is not None or an empty string in __init__ (after assigning from base_url/token/settings.INTERNAL_SERVICE_TOKEN) and raise a clear exception (e.g., ValueError) if it's blank so the service fails fast; keep _build_headers as-is but assume it will always have a non-empty token after this validation.
67-77:⚠️ Potential issue | 🟠 MajorNormalize transport/schema failures before they escape the gateway.
Lines 68-77 can still raise raw
httpxtransport or JSON decode errors, and Line 169 can raiseValidationError. Those exceptions bypassRemoteRagGatewayError, so callers lose the gateway's failure contract and any remote→local fallback never triggers. Please verify_post_model()andquery()wrap these failures and re-raiseRemoteRagGatewayError.Also applies to: 150-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/remote_gateway.py` around lines 67 - 77, _post_model and query currently let raw httpx transport/decoding errors and pydantic ValidationError escape the gateway; wrap the network/JSON/validation work in try/except blocks and re-raise RemoteRagGatewayError so callers see the gateway failure contract. Specifically, in _post_model, wrap the AsyncClient.post call and the response.json() call in a try/except that catches httpx.TransportError/httpx.HTTPError (or broader httpx.Error) and JSONDecodeError (or httpx.DecodingError) and raise RemoteRagGatewayError(..., cause=exc) via _raise_remote_error or by constructing RemoteRagGatewayError with the original exception attached. Likewise, in query wrap the code that constructs/validates the response model (the pydantic validation path that can raise pydantic.ValidationError) and convert that into RemoteRagGatewayError so validation failures are normalized to the gateway error contract.backend/app/services/rag/runtime_resolver.py (2)
217-226:⚠️ Potential issue | 🔴 CriticalQuery/delete runtime specs are still loading KBs without requester scoping.
_get_knowledge_base_record()only filters by globalKind.id, and Lines 226 and 261 reuse it while building delete/query runtime specs. That lets any active KB id drive retriever/model resolution even when the caller should not have access; threaduser_idthrough this helper and use the same access-checked lookup pattern asget_kb_index_info()orKnowledgeService.get_knowledge_base().Based on learnings, "Kind resource is uniquely identified by THREE fields:
namespace,name, anduser_id- always use all three fields to locate a Kind".Also applies to: 252-263, 326-340
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_resolver.py` around lines 217 - 226, build_delete_runtime_spec and similar helpers call _get_knowledge_base_record without scoping by requester user, allowing KB lookup by id alone; thread a requester_user_id (or index_owner_user_id) through build_delete_runtime_spec, build_query_runtime_spec, and any callers, and update _get_knowledge_base_record to require and use the requester_user_id when resolving the Kind (or replace the call with the same access-checked lookup used by get_kb_index_info / KnowledgeService.get_knowledge_base) so the Kind is located using namespace, name and user_id; apply the same change to the other affected methods that reuse _get_knowledge_base_record (the query/delete runtime spec builders and the resolver code around build_query_runtime_spec/build_delete_runtime_spec).
433-455:⚠️ Potential issue | 🔴 CriticalKeep non-
defaultmodel lookups tenant-scoped too.The
defaultbranch applies the tenant/shared filter, but Lines 446-455 drop it entirely. If two tenants define the samenamespace/name, this branch can resolve another tenant's model and forward its decrypted config downstream.Suggested fix
return ( db.query(Kind) .filter( Kind.kind == "Model", Kind.name == model_name, Kind.namespace == model_namespace, Kind.is_active == True, ) + .filter((Kind.user_id == user_id) | (Kind.user_id == 0)) + .order_by(Kind.user_id.desc()) .first() )Based on learnings, "Kind resource is uniquely identified by THREE fields:
namespace,name, anduser_id- always use all three fields to locate a Kind".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/rag/runtime_resolver.py` around lines 433 - 455, The non-`default` branch in runtime_resolver currently omits tenant scoping so it can return another tenant's Model; update the query in that branch to include the tenant/user filter so the Kind is located by the three identifying fields (Kind.kind, Kind.name, Kind.namespace) plus Kind.user_id. Specifically, in the code that builds the db.query(Kind) for the non-`default` case (the same place that currently filters Kind.kind, Kind.name, Kind.namespace, Kind.is_active), add a filter for Kind.user_id == user_id (or use the same (Kind.user_id == user_id) | (Kind.user_id == 0) logic if you intentionally want shared fallbacks) and then .order_by(Kind.user_id.desc()).first() so lookups are always tenant-scoped and uniquely identified.knowledge_runtime/knowledge_runtime/services/handlers.py (2)
147-156:⚠️ Potential issue | 🟠 MajorPreserve real
index_familyfrom engine records instead of hard-coding.Line 155 hard-codes
"chunk_vector", which can mislabel results once non-chunk families are returned.🧩 Suggested mapping update
- return [ - RemoteQueryRecord( - content=record.get("content", ""), - title=record.get("title", "Unknown"), - score=record.get("score"), - metadata=record.get("metadata"), - knowledge_base_id=knowledge_base_id, - document_id=self._extract_document_id(record.get("metadata")), - index_family="chunk_vector", - ) - for record in records - ] + built_records: list[RemoteQueryRecord] = [] + for record in records: + metadata = record.get("metadata") if isinstance(record.get("metadata"), dict) else {} + index_family = record.get("index_family") or metadata.get("index_family") + if not index_family: + raise ValueError("Missing index_family in query record payload") + built_records.append( + RemoteQueryRecord( + content=record.get("content", ""), + title=record.get("title", "Unknown"), + score=record.get("score"), + metadata=record.get("metadata"), + knowledge_base_id=knowledge_base_id, + document_id=self._extract_document_id(record.get("metadata")), + index_family=index_family, + ) + ) + return built_records🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 147 - 156, The code is hard-coding index_family="chunk_vector" when building RemoteQueryRecord in the handler; change it to preserve the source index family by reading the engine record's field (e.g., record.get("index_family") or record.get("family")) and fall back to "chunk_vector" if missing, so update the RemoteQueryRecord construction in the handler that builds the record (the block creating RemoteQueryRecord with document_id from _extract_document_id) to use the record-sourced value instead of the literal "chunk_vector".
180-184:⚠️ Potential issue | 🟡 MinorAvoid using attachment-id URL basenames as
source_file.Lines 180-184 treat any URL basename as filename; for
backend_attachment_streampaths this can become a numeric id and pollute document metadata.🛠️ Safer filename resolution
parsed_url = urlparse(request.content_ref.url) file_name = PurePosixPath(parsed_url.path).name if file_name: suffix = PurePosixPath(file_name).suffix.lower() - return file_name, suffix + if suffix or not file_name.isdigit(): + return file_name, suffix🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 180 - 184, The code uses the URL basename (file_name from parsed_url.path) as the source filename which can be a numeric attachment id for backend_attachment_stream URLs; update the logic around parsed_url, file_name and suffix so that if the path contains "backend_attachment_stream" or the derived file_name is purely numeric (file_name.isdigit()), you do NOT treat it as a real filename — instead return a neutral value (e.g., None or empty string) or fall back to other metadata; otherwise keep the existing behavior of returning file_name and suffix. Ensure the checks are applied where parsed_url, file_name and suffix are computed so numeric/attachment-id basenames are excluded from document metadata.
🧹 Nitpick comments (3)
knowledge_engine/tests/test_retrieval_filters.py (1)
8-24: Consider adding edge cases for completeness.While this test covers the mixed-case scenario well, consider adding tests for:
- Already-lowercase operators (to verify no regression)
- Unknown/invalid operators (to document the silent-ignore behavior)
- Empty or missing
operatorfield (to verify the"eq"default)💡 Example additional test cases
def test_build_elasticsearch_filters_handles_lowercase_operators() -> None: """Verify lowercase operators work correctly (regression guard).""" filters = build_elasticsearch_filters( "kb_1", { "operator": "and", "conditions": [ {"key": "status", "operator": "eq", "value": "active"}, ], }, ) assert {"term": {"metadata.status.keyword": "active"}} in filters def test_build_elasticsearch_filters_ignores_unknown_operators() -> None: """Unknown operators should be silently ignored.""" filters = build_elasticsearch_filters( "kb_1", { "operator": "and", "conditions": [ {"key": "field", "operator": "UNKNOWN", "value": "val"}, ], }, ) # Only the base knowledge_id filter should be present assert filters == [{"term": {"metadata.knowledge_id.keyword": "kb_1"}}]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/tests/test_retrieval_filters.py` around lines 8 - 24, Add unit tests exercising edge cases for build_elasticsearch_filters: (1) verify already-lowercase operators (e.g., operator "eq") produce expected filters to guard against regressions by asserting the corresponding term filter appears; (2) verify unknown/invalid operators are silently ignored by calling build_elasticsearch_filters with a condition using an unknown operator and asserting only the base knowledge_id filter is returned; and (3) verify empty or missing condition operator defaults to "eq" by omitting the operator field (or using empty string) and asserting a term filter is produced for that key. Use the existing test file/test naming style and reference build_elasticsearch_filters in each new test.knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py (1)
459-542: Consider splittinglist_documentsinto smaller helpers.This method is getting large; extracting query construction / aggregation parsing / pagination into helpers would improve maintainability.
As per coding guidelines, "Function length: Max 50 lines per function (preferred)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py` around lines 459 - 542, list_documents is too long; extract the query construction, aggregation parsing and pagination into small helpers: create a helper build_search_body(knowledge_id, size=0) that returns the search_body dict (used where search_body is built), a helper parse_aggregation_buckets(aggregations) that converts response["aggregations"]["documents"]["buckets"] into the all_docs list (extract doc_ref, source_file, chunk_count, created_at) and a helper paginate_documents(all_docs, page, page_size) that returns the paginated slice and total; then have list_documents call Elasticsearch.search with build_search_body, pass response["aggregations"] to parse_aggregation_buckets, and pass the result to paginate_documents before returning the final dict.knowledge_runtime/knowledge_runtime/services/handlers.py (1)
28-110: Instrument handler entrypoints with tracing decorators.The async service handlers are currently untraced, which leaves remote execution paths under-instrumented.
As per coding guidelines, Python async functions should use
@trace_async(span_name, tracer_name, extract_attributes)and span helpers (add_span_event,set_span_attribute) for OpenTelemetry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 28 - 110, The handler entrypoints index_document, query, delete_document_index, and test_connection are missing OpenTelemetry tracing: add the `@trace_async` decorator to each async method (use descriptive span_name like "handlers.index_document" and appropriate tracer_name) and instrument important steps inside each function with span helpers (call add_span_event and set_span_attribute around key operations such as storage_backend creation, service/QueryExecutor instantiation, and external calls like service.index_document_from_binary, executor.execute, service.delete_document, and storage_backend.test_connection) so traces contain attributes like knowledge_base_id, document_id, retrieval_config, and success; ensure extract_attributes passed to trace_async captures request metadata needed for indexing/querying.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/endpoints/internal/rag_content.py`:
- Around line 28-29: The current _binary_stream yields a single bytes buffer
because get_attachment_binary_data() materializes the entire attachment; change
the streaming contract so get_attachment_binary_data() (in ContextService)
returns a chunked iterator or file-like object and update _binary_stream to
iterate/read from that stream in fixed-size chunks (e.g., 8KB) and yield each
chunk, and update the caller in rag_content.py to pass the new
iterator/file-like instead of a bytes object; ensure any storage backends
implement the new stream-returning method so large attachments are streamed
rather than loaded fully into memory.
In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 158-199: The code builds QueryRuntimeSpec for a KB but resolves
tenant-scoped resources using the requester's user_id; instead, resolve
retriever and embedding model configs using the KB owner (kb.user_id) so Kind
lookup uses the correct (namespace, name, user_id) tuple; update the calls to
_build_resolved_retriever_config and _build_resolved_embedding_model_config in
the QueryRuntimeSpec construction to pass user_id=kb.user_id (or an explicit
index_owner_user_id) so the resource resolution matches the KB owner rather than
the requester.
In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md`:
- Line 5: The page title currently repeats the sidebar category by starting with
"Knowledge Runtime 独立化设计"; update the title to remove the redundant "Knowledge"
prefix (e.g., change "Knowledge Runtime 独立化设计" to "Runtime 独立化设计" or another
concise non-redundant variant) so it no longer duplicates the sidebar category.
In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 58-68: The async methods _aget_query_embedding and
_aget_text_embedding and the sync method _call_api need OpenTelemetry
decorators: add `@trace_async`(...) to both _aget_query_embedding and
_aget_text_embedding and add `@trace_sync`(...) to _call_api, supplying an
appropriate span name (e.g., "embedding._aget_query_embedding",
"embedding._aget_text_embedding", "embedding._call_api"), the tracer name used
across the repo (use the same tracer_name constant used elsewhere), and the
extract_attributes function used by your tracing utilities so the full function
execution is traced according to repository standards.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 49-73: The query handler currently ignores RemoteQueryRequest
fields enabled_index_families and retrieval_policy in the knowledge_base_configs
entries; add a fail-fast validation at the start of the loop in query (or before
executor creation) that checks each kb_config.enabled_index_families and
kb_config.retrieval_policy and raises/returns an explicit error (e.g.,
ValueError or RemoteQueryResponse error) if either is set and not supported,
mentioning the offending knowledge_base_id; reference the query method and the
knowledge_base_configs / kb_config objects to locate where to insert this
validation so the runtime no longer silently accepts unsupported multi-family or
policy requests.
---
Duplicate comments:
In `@backend/app/services/rag/remote_gateway.py`:
- Around line 53-65: The constructor __init__ of RemoteGateway currently allows
an empty INTERNAL_SERVICE_TOKEN and _build_headers then emits an Authorization:
Bearer header causing opaque 401s; validate that the resolved token
(self._token) is not None or an empty string in __init__ (after assigning from
base_url/token/settings.INTERNAL_SERVICE_TOKEN) and raise a clear exception
(e.g., ValueError) if it's blank so the service fails fast; keep _build_headers
as-is but assume it will always have a non-empty token after this validation.
- Around line 67-77: _post_model and query currently let raw httpx
transport/decoding errors and pydantic ValidationError escape the gateway; wrap
the network/JSON/validation work in try/except blocks and re-raise
RemoteRagGatewayError so callers see the gateway failure contract. Specifically,
in _post_model, wrap the AsyncClient.post call and the response.json() call in a
try/except that catches httpx.TransportError/httpx.HTTPError (or broader
httpx.Error) and JSONDecodeError (or httpx.DecodingError) and raise
RemoteRagGatewayError(..., cause=exc) via _raise_remote_error or by constructing
RemoteRagGatewayError with the original exception attached. Likewise, in query
wrap the code that constructs/validates the response model (the pydantic
validation path that can raise pydantic.ValidationError) and convert that into
RemoteRagGatewayError so validation failures are normalized to the gateway error
contract.
In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 217-226: build_delete_runtime_spec and similar helpers call
_get_knowledge_base_record without scoping by requester user, allowing KB lookup
by id alone; thread a requester_user_id (or index_owner_user_id) through
build_delete_runtime_spec, build_query_runtime_spec, and any callers, and update
_get_knowledge_base_record to require and use the requester_user_id when
resolving the Kind (or replace the call with the same access-checked lookup used
by get_kb_index_info / KnowledgeService.get_knowledge_base) so the Kind is
located using namespace, name and user_id; apply the same change to the other
affected methods that reuse _get_knowledge_base_record (the query/delete runtime
spec builders and the resolver code around
build_query_runtime_spec/build_delete_runtime_spec).
- Around line 433-455: The non-`default` branch in runtime_resolver currently
omits tenant scoping so it can return another tenant's Model; update the query
in that branch to include the tenant/user filter so the Kind is located by the
three identifying fields (Kind.kind, Kind.name, Kind.namespace) plus
Kind.user_id. Specifically, in the code that builds the db.query(Kind) for the
non-`default` case (the same place that currently filters Kind.kind, Kind.name,
Kind.namespace, Kind.is_active), add a filter for Kind.user_id == user_id (or
use the same (Kind.user_id == user_id) | (Kind.user_id == 0) logic if you
intentionally want shared fallbacks) and then
.order_by(Kind.user_id.desc()).first() so lookups are always tenant-scoped and
uniquely identified.
In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 75-77: The code directly indexes
response.json()["data"][0]["embedding"] which can raise opaque
KeyError/IndexError/TypeError for malformed API payloads; replace that with a
defensive extraction: call payload = response.json(), verify isinstance(payload,
dict), ensure "data" in payload and isinstance(payload["data"], list) and
len(payload["data"])>0 and isinstance(payload["data"][0], dict) and "embedding"
in payload["data"][0], then set embedding = payload["data"][0]["embedding"]; if
any check fails raise a clear ValueError (or custom exception) that includes the
full payload and a descriptive message so upstream API issues are visible (wrap
extraction in try/except to re-raise with payload on unexpected errors).
In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 147-156: The code is hard-coding index_family="chunk_vector" when
building RemoteQueryRecord in the handler; change it to preserve the source
index family by reading the engine record's field (e.g.,
record.get("index_family") or record.get("family")) and fall back to
"chunk_vector" if missing, so update the RemoteQueryRecord construction in the
handler that builds the record (the block creating RemoteQueryRecord with
document_id from _extract_document_id) to use the record-sourced value instead
of the literal "chunk_vector".
- Around line 180-184: The code uses the URL basename (file_name from
parsed_url.path) as the source filename which can be a numeric attachment id for
backend_attachment_stream URLs; update the logic around parsed_url, file_name
and suffix so that if the path contains "backend_attachment_stream" or the
derived file_name is purely numeric (file_name.isdigit()), you do NOT treat it
as a real filename — instead return a neutral value (e.g., None or empty string)
or fall back to other metadata; otherwise keep the existing behavior of
returning file_name and suffix. Ensure the checks are applied where parsed_url,
file_name and suffix are computed so numeric/attachment-id basenames are
excluded from document metadata.
In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 21-27: The auth_token fields in BackendAttachmentStreamContentRef
and the similar download model should not be plain str (they risk leaking in
repr/model_dump); change their type to pydantic SecretStr (or an excluded field
with exclude=True in the model config) and update any code that reads the token
to explicitly unwrap it at the transport edge (e.g., in
knowledge_runtime/services/content_fetcher.py and any header builder functions)
so callers still provide the raw token but it is only revealed when creating
request headers; ensure model serialization excludes the secret by default.
---
Nitpick comments:
In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py`:
- Around line 459-542: list_documents is too long; extract the query
construction, aggregation parsing and pagination into small helpers: create a
helper build_search_body(knowledge_id, size=0) that returns the search_body dict
(used where search_body is built), a helper
parse_aggregation_buckets(aggregations) that converts
response["aggregations"]["documents"]["buckets"] into the all_docs list (extract
doc_ref, source_file, chunk_count, created_at) and a helper
paginate_documents(all_docs, page, page_size) that returns the paginated slice
and total; then have list_documents call Elasticsearch.search with
build_search_body, pass response["aggregations"] to parse_aggregation_buckets,
and pass the result to paginate_documents before returning the final dict.
In `@knowledge_engine/tests/test_retrieval_filters.py`:
- Around line 8-24: Add unit tests exercising edge cases for
build_elasticsearch_filters: (1) verify already-lowercase operators (e.g.,
operator "eq") produce expected filters to guard against regressions by
asserting the corresponding term filter appears; (2) verify unknown/invalid
operators are silently ignored by calling build_elasticsearch_filters with a
condition using an unknown operator and asserting only the base knowledge_id
filter is returned; and (3) verify empty or missing condition operator defaults
to "eq" by omitting the operator field (or using empty string) and asserting a
term filter is produced for that key. Use the existing test file/test naming
style and reference build_elasticsearch_filters in each new test.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 28-110: The handler entrypoints index_document, query,
delete_document_index, and test_connection are missing OpenTelemetry tracing:
add the `@trace_async` decorator to each async method (use descriptive span_name
like "handlers.index_document" and appropriate tracer_name) and instrument
important steps inside each function with span helpers (call add_span_event and
set_span_attribute around key operations such as storage_backend creation,
service/QueryExecutor instantiation, and external calls like
service.index_document_from_binary, executor.execute, service.delete_document,
and storage_backend.test_connection) so traces contain attributes like
knowledge_base_id, document_id, retrieval_config, and success; ensure
extract_attributes passed to trace_async captures request metadata needed for
indexing/querying.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f6cc0853-b0d9-4405-914d-513d7b9c19c2
📒 Files selected for processing (24)
backend/app/api/endpoints/internal/rag_content.pybackend/app/services/rag/remote_gateway.pybackend/app/services/rag/runtime_resolver.pybackend/tests/api/endpoints/internal/test_rag_content_endpoint.pybackend/tests/services/rag/test_remote_gateway.pybackend/tests/services/rag/test_runtime_resolver.pydocs/plans/2026-04-04-rag-service-extraction-implementation-plan.mddocs/specs/knowledge/2026-04-03-rag-service-extraction-design.mdknowledge_engine/knowledge_engine/embedding/custom.pyknowledge_engine/knowledge_engine/embedding/factory.pyknowledge_engine/knowledge_engine/query/executor.pyknowledge_engine/knowledge_engine/retrieval/filters.pyknowledge_engine/knowledge_engine/storage/elasticsearch_backend.pyknowledge_engine/tests/storage/test_elasticsearch_backend.pyknowledge_engine/tests/test_embedding_factory.pyknowledge_engine/tests/test_query_executor.pyknowledge_engine/tests/test_retrieval_filters.pyknowledge_runtime/knowledge_runtime/core/config.pyknowledge_runtime/knowledge_runtime/services/handlers.pyknowledge_runtime/tests/conftest.pyknowledge_runtime/tests/test_config.pyknowledge_runtime/tests/test_query_execution.pyshared/models/knowledge_runtime_protocol.pyshared/tests/test_knowledge_runtime_protocol.py
✅ Files skipped from review due to trivial changes (1)
- docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md
🚧 Files skipped from review as they are similar to previous changes (7)
- knowledge_engine/knowledge_engine/retrieval/filters.py
- knowledge_engine/tests/test_query_executor.py
- knowledge_runtime/tests/test_query_execution.py
- knowledge_engine/knowledge_engine/query/executor.py
- backend/tests/api/endpoints/internal/test_rag_content_endpoint.py
- knowledge_engine/knowledge_engine/embedding/factory.py
- backend/tests/services/rag/test_runtime_resolver.py
| def _binary_stream(binary_data: bytes) -> Iterator[bytes]: | ||
| yield binary_data |
There was a problem hiding this comment.
This isn't actually streaming yet.
backend/app/services/context/context_service.py returns bytes from get_attachment_binary_data(), so Line 70 has already materialized the whole attachment and Line 85 just yields that buffer once. Large attachments will therefore spike Backend memory on the new remote-indexing path; this needs a chunked/file-like reader from the storage layer instead of wrapping a single bytes object.
Also applies to: 70-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/api/endpoints/internal/rag_content.py` around lines 28 - 29, The
current _binary_stream yields a single bytes buffer because
get_attachment_binary_data() materializes the entire attachment; change the
streaming contract so get_attachment_binary_data() (in ContextService) returns a
chunked iterator or file-like object and update _binary_stream to iterate/read
from that stream in fixed-size chunks (e.g., 8KB) and yield each chunk, and
update the caller in rag_content.py to pass the new iterator/file-like instead
of a bytes object; ensure any storage backends implement the new
stream-returning method so large attachments are streamed rather than loaded
fully into memory.
| kb, has_access = KnowledgeService.get_knowledge_base( | ||
| db=db, | ||
| knowledge_base_id=knowledge_base_id, | ||
| user_id=user_id, | ||
| ) | ||
| if kb is None or not has_access: | ||
| raise ValueError( | ||
| f"Knowledge base {knowledge_base_id} not found or access denied" | ||
| ) | ||
|
|
||
| return QueryRuntimeSpec( | ||
| knowledge_base_ids=[knowledge_base_id], | ||
| query=query, | ||
| max_results=max_results, | ||
| route_mode="rag_retrieval", | ||
| metadata_condition=metadata_condition, | ||
| user_id=user_id, | ||
| user_name=user_name, | ||
| knowledge_base_configs=[ | ||
| QueryKnowledgeBaseRuntimeConfig( | ||
| knowledge_base_id=knowledge_base_id, | ||
| index_owner_user_id=kb.user_id, | ||
| retriever_config=self._build_resolved_retriever_config( | ||
| db=db, | ||
| user_id=user_id, | ||
| name=retriever_name, | ||
| namespace=retriever_namespace, | ||
| ), | ||
| embedding_model_config=self._build_resolved_embedding_model_config( | ||
| db=db, | ||
| user_id=user_id, | ||
| model_name=embedding_model_name, | ||
| model_namespace=embedding_model_namespace, | ||
| user_name=user_name, | ||
| ), | ||
| retrieval_config=RuntimeRetrievalConfig( | ||
| top_k=max_results, | ||
| score_threshold=score_threshold, | ||
| retrieval_mode=retrieval_mode, | ||
| vector_weight=vector_weight, | ||
| keyword_weight=keyword_weight, | ||
| ), |
There was a problem hiding this comment.
Resolve public-query configs with the KB owner, not the requester.
Line 179 stores kb.user_id as the index owner, but Lines 180-192 resolve both the retriever and embedding model with the requester's user_id. For shared/public KBs, a same-named tenant resource can override the KB's actual runtime config, leading to wrong backends or failed queries.
Suggested fix
retriever_config=self._build_resolved_retriever_config(
db=db,
- user_id=user_id,
+ user_id=kb.user_id,
name=retriever_name,
namespace=retriever_namespace,
),
embedding_model_config=self._build_resolved_embedding_model_config(
db=db,
- user_id=user_id,
+ user_id=kb.user_id,
model_name=embedding_model_name,
model_namespace=embedding_model_namespace,
user_name=user_name,
),Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/services/rag/runtime_resolver.py` around lines 158 - 199, The
code builds QueryRuntimeSpec for a KB but resolves tenant-scoped resources using
the requester's user_id; instead, resolve retriever and embedding model configs
using the KB owner (kb.user_id) so Kind lookup uses the correct (namespace,
name, user_id) tuple; update the calls to _build_resolved_retriever_config and
_build_resolved_embedding_model_config in the QueryRuntimeSpec construction to
pass user_id=kb.user_id (or an explicit index_owner_user_id) so the resource
resolution matches the KB owner rather than the requester.
| sidebar_position: 1 | ||
| --- | ||
|
|
||
| # Knowledge Runtime 独立化设计 |
There was a problem hiding this comment.
Avoid repeating the sidebar category in the title.
Line 5 currently starts with “Knowledge …”, which likely duplicates the sidebar category context for this page.
✏️ Suggested title tweak
-# Knowledge Runtime 独立化设计
+# Runtime 独立化设计As per coding guidelines, docs/**/*.md: Document titles should NOT repeat the sidebar category name.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Knowledge Runtime 独立化设计 | |
| # Runtime 独立化设计 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md` at line 5,
The page title currently repeats the sidebar category by starting with
"Knowledge Runtime 独立化设计"; update the title to remove the redundant "Knowledge"
prefix (e.g., change "Knowledge Runtime 独立化设计" to "Runtime 独立化设计" or another
concise non-redundant variant) so it no longer duplicates the sidebar category.
| async def _aget_query_embedding(self, query: str) -> list[float]: | ||
| return await asyncio.to_thread(self._get_query_embedding, query) | ||
|
|
||
| async def _aget_text_embedding(self, text: str) -> list[float]: | ||
| return await asyncio.to_thread(self._get_text_embedding, text) | ||
|
|
||
| @retry( | ||
| stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10) | ||
| stop=stop_after_attempt(3), | ||
| wait=wait_exponential(multiplier=1, min=2, max=10), | ||
| ) | ||
| def _call_api(self, text: str) -> List[float]: | ||
| """ | ||
| Call external embedding API with retry mechanism. | ||
|
|
||
| Args: | ||
| text: Text to embed | ||
|
|
||
| Returns: | ||
| Embedding vector | ||
|
|
||
| Raises: | ||
| requests.HTTPError: If API call fails after retries | ||
| """ | ||
| def _call_api(self, text: str) -> list[float]: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add OpenTelemetry tracing decorators to changed async/sync methods.
These updated methods are still missing tracing decorators required by repository standards.
As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry" and "Use @trace_sync(span_name, tracer_name, extract_attributes) decorator to trace entire sync functions in OpenTelemetry".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_engine/knowledge_engine/embedding/custom.py` around lines 58 - 68,
The async methods _aget_query_embedding and _aget_text_embedding and the sync
method _call_api need OpenTelemetry decorators: add `@trace_async`(...) to both
_aget_query_embedding and _aget_text_embedding and add `@trace_sync`(...) to
_call_api, supplying an appropriate span name (e.g.,
"embedding._aget_query_embedding", "embedding._aget_text_embedding",
"embedding._call_api"), the tracer name used across the repo (use the same
tracer_name constant used elsewhere), and the extract_attributes function used
by your tracing utilities so the full function execution is traced according to
repository standards.
| async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse: | ||
| metadata_condition = self._combine_metadata_conditions( | ||
| self._build_document_filter(request.document_ids), | ||
| request.metadata_condition, | ||
| ) | ||
| records: list[RemoteQueryRecord] = [] | ||
|
|
||
| for kb_config in request.knowledge_base_configs: | ||
| storage_backend = create_storage_backend_from_runtime_config( | ||
| kb_config.retriever_config | ||
| ) | ||
| embed_model = create_embedding_model_from_runtime_config( | ||
| kb_config.embedding_model_config | ||
| ) | ||
| executor = QueryExecutor( | ||
| storage_backend=storage_backend, | ||
| embed_model=embed_model, | ||
| ) | ||
| result = await executor.execute( | ||
| knowledge_id=str(kb_config.knowledge_base_id), | ||
| query=request.query, | ||
| retrieval_config=kb_config.retrieval_config, | ||
| metadata_condition=metadata_condition, | ||
| user_id=kb_config.index_owner_user_id, | ||
| ) |
There was a problem hiding this comment.
Do not silently ignore enabled_index_families / retrieval_policy.
Lines 49-73 accept these contract fields but never apply or validate them, so the runtime can acknowledge policies it doesn’t execute.
🛠️ Fail-fast until multi-family/policy execution is wired
async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
+ unsupported_families = [
+ family for family in request.enabled_index_families if family != "chunk_vector"
+ ]
+ if unsupported_families:
+ raise ValueError(
+ f"Unsupported enabled_index_families: {unsupported_families}"
+ )
+ if request.retrieval_policy not in (None, "chunk_only"):
+ raise ValueError(
+ f"Unsupported retrieval_policy: {request.retrieval_policy}"
+ )
+
metadata_condition = self._combine_metadata_conditions(
self._build_document_filter(request.document_ids),
request.metadata_condition,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 49 -
73, The query handler currently ignores RemoteQueryRequest fields
enabled_index_families and retrieval_policy in the knowledge_base_configs
entries; add a fail-fast validation at the start of the loop in query (or before
executor creation) that checks each kb_config.enabled_index_families and
kb_config.retrieval_policy and raises/returns an explicit error (e.g.,
ValueError or RemoteQueryResponse error) if either is set and not supported,
mentioning the offending knowledge_base_id; reference the query method and the
knowledge_base_configs / kb_config objects to locate where to insert this
validation so the runtime no longer silently accepts unsupported multi-family or
policy requests.
Summary by CodeRabbit
New Features
Improvements