Skip to content

Feature/rag service extraction#927

Open
kissghosts wants to merge 9 commits intowecode-ai:mainfrom
kissghosts:feature/rag-service-extraction
Open

Feature/rag service extraction#927
kissghosts wants to merge 9 commits intowecode-ai:mainfrom
kissghosts:feature/rag-service-extraction

Conversation

@kissghosts
Copy link
Copy Markdown
Collaborator

@kissghosts kissghosts commented Apr 5, 2026

Summary by CodeRabbit

  • New Features

    • Per-operation RAG runtime mode (local/remote) with graceful remote fallback
    • Internal streaming endpoint for secure attachment downloads (token-based)
    • Remote content refs: presigned URL or backend-streaming for remote indexing
    • New lightweight remote runtime service and packaged execution kernel
  • Improvements

    • More robust retrieval/indexing routing and connection testing
    • Expanded test coverage and documentation for RAG runtime behavior

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

The PR extracts a backend-agnostic RAG execution kernel into a new knowledge_engine, adds a thin remote adapter knowledge_runtime, introduces per-operation runtime routing (local/remote) and gateway factories, and adds authenticated internal content streaming and attachment token support. Runtime specs and gateways drive local vs. remote execution selection.

Changes

Cohort / File(s) Summary
Core packages & new engines
knowledge_engine/knowledge_engine/__init__.py, knowledge_engine/pyproject.toml, knowledge_engine/knowledge_engine/*
Add new knowledge_engine package implementing indexing, splitting, embedding factory, query executor, storage backends (Elasticsearch/Qdrant/Milvus), and public exports.
Remote runtime service
knowledge_runtime/knowledge_runtime/__init__.py, knowledge_runtime/pyproject.toml, knowledge_runtime/knowledge_runtime/main.py, knowledge_runtime/knowledge_runtime/api/*, knowledge_runtime/knowledge_runtime/core/config.py, knowledge_runtime/knowledge_runtime/services/*, knowledge_runtime/knowledge_runtime/security.py, docker/knowledge_runtime/Dockerfile, knowledge_runtime/start.sh
Introduce knowledge_runtime FastAPI service with internal /internal/rag/* endpoints, internal-token auth, content fetcher, handlers delegating to knowledge_engine, and containerization.
Shared runtime protocol
shared/models/knowledge_runtime_protocol.py, shared/models/__init__.py
Add strict Pydantic transport models (ContentRef union, RemoteIndex/Query/Delete/Test models, runtime configs, RemoteRagError) for Backend ↔ knowledge_runtime communication.
Backend config & auth tokens
backend/app/core/config.py, backend/app/services/auth/rag_download_token.py, backend/app/services/auth/__init__.py
Add KNOWLEDGE_RUNTIME_URL, RAG_RUNTIME_MODE with parsing/lookup, and short-lived JWT create/verify for attachment download tokens exported from auth package.
Gateway, runtime specs, factories
backend/app/services/rag/runtime_resolver.py, backend/app/services/rag/runtime_specs.py, backend/app/services/rag/gateway.py, backend/app/services/rag/gateway_factory.py, backend/app/services/rag/remote_gateway.py, backend/app/services/rag/local_gateway.py
Centralize runtime-spec construction (index/query/delete), introduce Delete/Connection test specs, update RagGateway protocol signatures, add RemoteRagGateway client with error translation, and add gateway factory to select Local/Remote per operation.
Backend local data-plane & service changes
backend/app/services/rag/local_data_plane/*, backend/app/services/rag/content_refs.py, backend/app/services/rag/retrieval_service.py, backend/app/services/rag/embedding/factory.py, backend/app/services/rag/storage/factory.py, backend/app/services/knowledge/*
Refactor local data-plane to delegate embedding/storage creation to engine factories; add content-ref builder (presigned vs backend stream), adapt retrieval/index/delete flows to runtime specs and gateway selection.
API surface & endpoints
backend/app/api/api.py, backend/app/api/endpoints/internal/rag.py, backend/app/api/endpoints/internal/rag_content.py, backend/app/api/endpoints/rag.py, backend/app/api/endpoints/adapter/retrievers.py
Register new internal /rag/content streaming route with token auth; make /internal/rag/retrieve use gateway-based runtime-spec flow with remote fallback; register internal router unconditionally in api; convert retriever test-connection to async gateway-based flow; remove legacy upload/delete/list endpoints.
Removed backend implementations
backend/app/services/rag/document_service.py, backend/app/services/rag/index/*, backend/app/services/rag/splitter/*, backend/app/services/rag/retrieval/retriever.py
Delete previous Backend-local indexing/splitter/retriever implementations; their responsibilities moved to knowledge_engine.
Schemas & docs
backend/app/schemas/rag.py, docs under docs/*, backend/app/services/rag/README.md
Remove document upload/list schemas and update docs to reflect runtime boundary, direct-injection vs retrieval routing, and knowledge_engine/knowledge_runtime roles.
Tests & CI updates
backend/tests/**, knowledge_engine/tests/**, knowledge_runtime/tests/**, shared/tests/**
Add/adjust tests for runtime-mode routing, RemoteRagGateway client, content streaming auth tokens, storage backends, QueryExecutor, embedding factory, and import/packaging checks.
Docker / compose / deps
docker/knowledge_runtime/Dockerfile, docker-compose.yml, backend/pyproject.toml
Add knowledge_runtime service in compose, add wegent-knowledge-engine dependency and workspace mapping, and update formatter configs.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Backend
    participant Gateway as RagGateway
    participant Local as LocalRagGateway
    participant Remote as RemoteRagGateway
    participant KR as knowledge_runtime
    participant KE as knowledge_engine
    participant Storage

    rect rgba(200,150,255,0.5)
    Note over Client,Storage: Local retrieval flow
    Client->>Backend: POST /api/internal/rag/retrieve
    Backend->>Backend: build QueryRuntimeSpec (may finalize route_mode)
    Backend->>Gateway: get_query_gateway()
    Gateway-->>Backend: LocalRagGateway
    Backend->>Local: LocalRagGateway.query(spec, db)
    Local->>KE: QueryExecutor.execute(...)
    KE->>Storage: retrieve(...)
    Storage-->>KE: records
    KE-->>Local: records
    Local-->>Backend: records
    Backend-->>Client: { records: [...] }
    end
Loading
sequenceDiagram
    participant Client
    participant Backend
    participant Gateway as RagGateway
    participant Remote as RemoteRagGateway
    participant KR as knowledge_runtime
    participant KE as knowledge_engine
    participant Storage

    rect rgba(255,200,150,0.5)
    Note over Client,Storage: Remote retrieval flow with fallback
    Client->>Backend: POST /api/internal/rag/retrieve
    Backend->>Backend: build QueryRuntimeSpec (route_mode may be "auto")
    Backend->>Gateway: get_query_gateway()
    Gateway-->>Backend: RemoteRagGateway
    Backend->>Remote: POST /internal/rag/query (RemoteRagGateway.query)
    Remote->>KR: runtime handler validates and fetches content (if needed)
    KR->>KE: QueryExecutor.execute(...)
    KE->>Storage: retrieve(...)
    Storage-->>KE: records
    KE-->>KR: records
    KR-->>Remote: RemoteQueryResponse
    Remote-->>Backend: RemoteQueryResponse
    alt remote raises RemoteRagGatewayError
      Backend->>Local: LocalRagGateway.query(spec, db)  -- fallback
      Local-->>Backend: records
    end
    Backend-->>Client: { records: [...] }
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I bounced through specs and routers anew,

Tokens and streams, a hop and a chew.
Engines now hum where splitters once played,
A runtime that fetches, a gateway that swayed.
Hooray — modular carrots laid out in view!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (9)
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-94-95 (1)

94-95: ⚠️ Potential issue | 🟡 Minor

Replace local absolute paths with repository-relative doc links.

Lines 94-95 currently link to /Users/..., which will break for everyone except the original author machine.

🔗 Proposed fix
-- Modify: [2026-04-03-rag-service-extraction-design.md](/Users/yanhe1/iDev/weibo/wegent/docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md)
-- Modify: [2026-04-04-rag-service-extraction-implementation-plan.md](/Users/yanhe1/iDev/weibo/wegent/docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md)
+- Modify: [2026-04-03-rag-service-extraction-design.md](../specs/knowledge/2026-04-03-rag-service-extraction-design.md)
+- Modify: [2026-04-04-rag-service-extraction-implementation-plan.md](./2026-04-04-rag-service-extraction-implementation-plan.md)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md` around
lines 94 - 95, The links in the diff point to absolute local paths (e.g.,
/Users/yanhe1/.../2026-04-03-rag-service-extraction-design.md and
/Users/yanhe1/.../2026-04-04-rag-service-extraction-implementation-plan.md);
update those to repository-relative links (e.g.,
docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md and
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md) so the
references work for all contributors and CI; edit the link targets in the file
that contains the two links to replace the absolute paths with the repo-relative
paths while keeping the same display text.
backend/tests/services/rag/test_retrieval_service.py-334-336 (1)

334-336: ⚠️ Potential issue | 🟡 Minor

Add return type hints to the new test methods.

Both new methods should declare -> None to comply with the Python typing rule.

🔧 Suggested typing fix
     def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget(
         self,
-    ):
+    ) -> None:
@@
     def test_decide_route_mode_for_chat_shell_returns_direct_injection_when_auto_fits(
         self,
-    ):
+    ) -> None:
As per coding guidelines, "Python: Follow PEP 8, Black formatter (line length: 88), isort, type hints required".

Also applies to: 352-354

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/services/rag/test_retrieval_service.py` around lines 334 - 336,
Add explicit return type hints "-> None" to the new test function definitions so
they conform to the project's typing rules; update the signature of
test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget (and
the other new test method around lines 352-354) to include "-> None" after the
parameter list, keeping the rest of the signature unchanged.
backend/app/services/rag/storage/factory.py-166-189 (1)

166-189: ⚠️ Potential issue | 🟡 Minor

Fail fast when runtime storage URL is missing.

The new runtime path accepts url=None and passes it to backend constructors, which can fail later with less actionable errors. Validate it here and raise a clear ValueError.

🔧 Suggested fix
 def create_storage_backend_from_runtime_config(
     retriever_config: RuntimeRetrieverConfig,
 ) -> BaseStorageBackend:
     """Create storage backend from resolved runtime retriever config."""
     storage_config = retriever_config.storage_config or {}
     storage_type = storage_config.get("type", "").lower()

     if storage_type not in STORAGE_BACKEND_REGISTRY:
         raise ValueError(
             f"Unsupported storage type: {storage_type}. "
             f"Supported types: {list(STORAGE_BACKEND_REGISTRY.keys())}"
         )
+    if not storage_config.get("url"):
+        raise ValueError("Storage url is required in runtime storage_config")

     config = {
         "url": storage_config.get("url"),
         "username": storage_config.get("username"),
         "password": storage_config.get("password"),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/storage/factory.py` around lines 166 - 189, The
runtime storage factory create_storage_backend_from_runtime_config currently
allows storage_config.get("url") to be None and forwards that to backend
constructors; add a validation after computing storage_type (and after ensuring
storage_type is supported) that checks the resolved URL
(storage_config.get("url")) is non-empty/ non-None and raise a clear ValueError
like "Missing storage URL for storage_type '<type>'" if absent; then proceed to
build the config dict (with "url") and instantiate backend_class from
STORAGE_BACKEND_REGISTRY as before.
backend/app/services/rag/local_gateway.py-45-47 (1)

45-47: ⚠️ Potential issue | 🟡 Minor

Add explicit db guard for parity with other gateway methods.

index_document and query fail fast when db is missing; delete_document_index should do the same for consistent error behavior.

🔧 Suggested fix
     async def delete_document_index(
         self,
         spec: DeleteRuntimeSpec,
         *,
         db: Session,
     ) -> dict:
+        if db is None:
+            raise ValueError("db is required for LocalRagGateway.delete_document_index")
         return await self._delete_executor(spec, db=db)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/local_gateway.py` around lines 45 - 47, The
delete_document_index method currently forwards to _delete_executor without
validating the db parameter; add an explicit guard at the start of
delete_document_index (the same way index_document and query do) to raise an
immediate error (e.g., throw a ValueError or appropriate framework exception)
when db is missing or falsy, then call and return await
self._delete_executor(spec, db=db).
knowledge_engine/tests/test_document_service.py-45-55 (1)

45-55: ⚠️ Potential issue | 🟡 Minor

Weak assertion for created_at field.

Line 52 asserts "created_at": result["created_at"], which always passes since it compares the result to itself. This doesn't validate that a timestamp was actually generated.

Proposed fix using isinstance or ANY matcher
+from unittest.mock import ANY
+
     assert result == {
         "doc_ref": "9",
         "knowledge_id": "1",
         "source_file": "report.pdf",
         "chunk_count": 2,
         "index_name": "wegent_kb_1",
         "status": "success",
-        "created_at": result["created_at"],
+        "created_at": ANY,
         "chunks_data": [{"chunk_index": 0}, {"chunk_index": 1}],
         "indexed_count": 2,
     }
+    # Optionally verify it's a valid ISO timestamp
+    assert isinstance(result["created_at"], str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/tests/test_document_service.py` around lines 45 - 55, The
test currently compares "created_at" to itself which always succeeds; update the
assertion in test_document_service.py (the result dict for variable result) to
verify created_at is present and a valid timestamp (e.g., assert "created_at" in
result and that result["created_at"] is a string parsable by
datetime.fromisoformat or matches an ISO/timestamp regex, or use a matcher like
unittest.mock.ANY) so the test actually validates a timestamp was generated.
docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md-333-360 (1)

333-360: ⚠️ Potential issue | 🟡 Minor

Normalize subsection heading depth to project docs hierarchy.

Line 333 and Line 347 use ####, but this repo’s docs convention caps subsection depth at ###.

📄 Suggested doc fix
-#### `backend_attachment_stream`
+### `backend_attachment_stream`

...

-#### `presigned_url`
+### `presigned_url`

As per coding guidelines, "Use consistent heading hierarchy: # for title, ## for sections, ### for subsections".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md` around
lines 333 - 360, The subsection headings for `backend_attachment_stream` and
`presigned_url` use four-level headers (####) but must follow the repo
convention that caps subsections at three-level headers (###); update those two
headings from `####` to `###` so they match the project's hierarchy and restore
consistent heading depth across the document.
backend/app/services/rag/content_refs.py-29-45 (1)

29-45: ⚠️ Potential issue | 🟡 Minor

Validate expires_delta_seconds to prevent invalid/expired content refs.

Line 40 can produce already-expired refs when expires_delta_seconds <= 0. Add a fail-fast guard.

🧪 Suggested fix
 def build_content_ref_for_attachment(
     *,
     db: Session,
     attachment_id: int,
     expires_delta_seconds: int = DEFAULT_CONTENT_REF_TTL_SECONDS,
 ) -> BackendAttachmentStreamContentRef | PresignedUrlContentRef:
     """Build a content reference for an attachment used by remote indexing."""
+    if expires_delta_seconds <= 0:
+        raise ValueError("expires_delta_seconds must be greater than 0")
 
     context = context_service.get_context_optional(
         db=db,
         context_id=attachment_id,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/content_refs.py` around lines 29 - 45, The code must
fail fast when expires_delta_seconds is non-positive to avoid creating
already-expired refs: in the function that returns
BackendAttachmentStreamContentRef | PresignedUrlContentRef, validate the
expires_delta_seconds parameter (defaults to DEFAULT_CONTENT_REF_TTL_SECONDS)
right after the signature and before calling
context_service.get_context_optional; if expires_delta_seconds <= 0 raise a
ValueError (e.g., "expires_delta_seconds must be positive") and otherwise
proceed to compute expires_at and call context_service.get_attachment_url;
reference the symbols expires_delta_seconds, DEFAULT_CONTENT_REF_TTL_SECONDS,
expires_at, context_service.get_context_optional, and
context_service.get_attachment_url when making the change.
shared/tests/test_knowledge_runtime_protocol.py-73-90 (1)

73-90: ⚠️ Potential issue | 🟡 Minor

Isolate unknown content_ref.kind as the sole failing validation.

Line 82 and 83 provide incomplete nested configs that lack required fields (name for retriever, model_name for embedding). This causes the test to raise ValidationError before validating the content_ref.kind discriminator, obscuring the test's intended assertion.

Fix: Supply complete, schema-conformant configs
     with pytest.raises(ValidationError):
         remote_index_request.model_validate(
             {
                 "knowledge_base_id": 11,
                 "document_id": 22,
                 "index_owner_user_id": 33,
-                "retriever_config": {"provider": "milvus"},
-                "embedding_model_config": {"model": "text-embedding-3-large"},
+                "retriever_config": {
+                    "name": "retriever-a",
+                    "namespace": "default",
+                    "storage_config": {"type": "milvus", "url": "http://milvus:19530"},
+                },
+                "embedding_model_config": {
+                    "model_name": "text-embedding-3-large",
+                    "model_namespace": "default",
+                    "resolved_config": {"protocol": "openai"},
+                },
                 "index_families": ["chunk_vector"],
                 "content_ref": {
                     "kind": "unsupported_kind",
                     "url": "http://backend:8000/api/internal/rag/content/22",
                 },
             }
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/tests/test_knowledge_runtime_protocol.py` around lines 73 - 90, The
test test_remote_index_request_rejects_unknown_content_ref_kind currently fails
earlier because retriever_config and embedding_model_config are incomplete;
update the remote_index_request.model_validate payload to supply
schema-conformant configs (e.g., include the required retriever field like
"name" in retriever_config and the required embedding field like "model_name" or
"model" in embedding_model_config) so ValidationError is triggered solely by the
unsupported content_ref.kind discriminator; ensure you keep the same content_ref
with "kind": "unsupported_kind" and reference the
RemoteIndexRequest/remote_index_request symbols when locating the test.
docker-compose-rag.yml-21-22 (1)

21-22: ⚠️ Potential issue | 🟡 Minor

Healthcheck command exposes password in process listing.

The -p123456 flag makes the root password visible via ps aux. Use environment variable reference instead:

   healthcheck:
-    test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p123456"]
+    test: ["CMD-SHELL", "mysqladmin ping -h localhost -u root -p$$MYSQL_ROOT_PASSWORD"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose-rag.yml` around lines 21 - 22, The healthcheck currently
exposes the root password by using the literal "-p123456" in the healthcheck
test; change the healthcheck for the MySQL service to read the password from the
environment (e.g., MYSQL_ROOT_PASSWORD) instead of a hardcoded value — update
the healthcheck "test" entry to invoke a shell (sh -c) so the environment
variable is expanded and pass the password as -p"$MYSQL_ROOT_PASSWORD" or use a
client option file via --defaults-extra-file, ensuring the literal password is
removed from the test command.
🧹 Nitpick comments (18)
backend/tests/services/rag/test_retrieval_service.py (1)

334-351: Strengthen no-budget route test by asserting estimator is not called.

This test currently validates only the returned mode. It should also guard against regressions where token estimation is still executed when context_window is None.

✅ Suggested test hardening
-    def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget(
-        self,
-    ):
+    def test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget(
+        self,
+    ) -> None:
         from app.services.rag.retrieval_service import RetrievalService

         service = RetrievalService()
         db = MagicMock()

-        result = service.decide_route_mode_for_chat_shell(
-            query="test",
-            knowledge_base_ids=[123],
-            db=db,
-            route_mode="auto",
-            context_window=None,
-        )
+        with patch.object(
+            RetrievalService,
+            "_estimate_total_tokens_for_knowledge_bases",
+        ) as mock_estimate:
+            result = service.decide_route_mode_for_chat_shell(
+                query="test",
+                knowledge_base_ids=[123],
+                db=db,
+                route_mode="auto",
+                context_window=None,
+            )

         assert result == "rag_retrieval"
+        mock_estimate.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/services/rag/test_retrieval_service.py` around lines 334 - 351,
The test
test_decide_route_mode_for_chat_shell_returns_rag_retrieval_without_budget
should also assert that no token estimation is performed when context_window is
None; update the test to mock or spy the estimator used by
RetrievalService.decide_route_mode_for_chat_shell (e.g., patch
RetrievalService.estimate_token_count or the external TokenEstimator
class/function the method calls) and add an assertion that that mock was not
called, while leaving the existing assertion that the result == "rag_retrieval".
backend/tests/services/rag/test_content_refs.py (1)

11-26: Extract test magic numbers to named constants.

Please replace inline literals (101, 0, 7, 42) with constants to keep intent clear and simplify future updates.
As per coding guidelines, "Python: Extract magic numbers to constants".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/services/rag/test_content_refs.py` around lines 11 - 26,
Replace the inline magic numbers in _create_attachment_context with module-level
named constants: define constants such as DEFAULT_ATTACHMENT_ID = 101,
DEFAULT_SUBTASK_ID = 0, DEFAULT_USER_ID = 7, and DEFAULT_FILE_SIZE = 42 at the
top of the test file and use them in the SubtaskContext construction (replace
101, 0, 7, 42). Update any calls to _create_attachment_context that rely on the
default attachment_id to continue passing explicit values when needed; keep the
constant names descriptive and all-caps per project style.
backend/tests/services/test_summary_service.py (1)

442-444: Remove redundant assertion to keep the test focused.

assert_called_once_with(...) already validates knowledge_base_id, so the extra kwargs assertion is duplicate noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/services/test_summary_service.py` around lines 442 - 444,
Remove the redundant kwargs assertion that checks
mock_build_delete_runtime_spec.call_args.kwargs["knowledge_base_id"] ==
test_knowledge_base.id; the existing assert_called_once_with(...) on
mock_build_delete_runtime_spec already verifies the same parameter. Locate the
assertion referencing mock_build_delete_runtime_spec.call_args.kwargs and delete
that line so the test only uses
mock_build_delete_runtime_spec.assert_called_once_with(...) to validate the
call.
backend/app/api/endpoints/internal/rag.py (3)

170-174: Add type hints to helper function.

The helper function lacks type hints, which are required per coding guidelines.

Proposed type hints
-def _resolve_query_gateway(runtime_spec):
+def _resolve_query_gateway(runtime_spec) -> RagGateway:
     route_mode = getattr(runtime_spec, "route_mode", "auto")
     if route_mode == "rag_retrieval":
         return get_query_gateway()
     return LocalRagGateway()

Note: You'll need to import RagGateway from app.services.rag.gateway if not already imported.

As per coding guidelines, "Python: ... type hints required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag.py` around lines 170 - 174, Add type
hints to _resolve_query_gateway by annotating its parameter and return type:
import Any from typing and annotate runtime_spec: Any, and set the function
return type to RagGateway (import RagGateway from app.services.rag.gateway).
Ensure the body still checks route_mode with getattr and returns
get_query_gateway() or LocalRagGateway() but with the function signature updated
to def _resolve_query_gateway(runtime_spec: Any) -> RagGateway:.

203-213: Add type hints and consider logging level.

  1. Missing return type hint
  2. The fallback path logs at WARNING level, which is appropriate for transient failures
Proposed type hints
-async def _execute_query_with_remote_fallback(runtime_spec, db: Session):
+async def _execute_query_with_remote_fallback(runtime_spec, db: Session) -> dict:
     rag_gateway = _resolve_query_gateway(runtime_spec)

As per coding guidelines, "Python: ... type hints required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag.py` around lines 203 - 213, Add
explicit type hints to _execute_query_with_remote_fallback: annotate the
function signature with a return type (e.g., async def
_execute_query_with_remote_fallback(runtime_spec: RuntimeSpec, db: Session) ->
RagQueryResult) or, if RagQueryResult/RuntimeSpec aren't defined yet, use
typing.Any for runtime_spec and the return type and then replace with concrete
types later; ensure necessary imports (RuntimeSpec, RagQueryResult or
typing.Any) are added at the top. Keep the existing logger.warning level as-is
for the RemoteRagGatewayError except path; do not change the fallback behavior
of LocalRagGateway().query or the use of _resolve_query_gateway and
RemoteRagGatewayError.

177-200: Add type hints and consider simplifying attribute checks.

  1. Missing type hints on _finalize_query_runtime_spec
  2. The hasattr check at line 187 followed by getattr at line 191 is slightly redundant
Proposed improvements
-def _finalize_query_runtime_spec(runtime_spec, db: Session):
+def _finalize_query_runtime_spec(runtime_spec, db: Session):  # -> QueryRuntimeSpec
     if getattr(runtime_spec, "route_mode", "auto") != "auto":
         return runtime_spec
     required_attributes = (
         "query",
         "knowledge_base_ids",
         "document_ids",
         "direct_injection_budget",
         "model_copy",
     )
     if not all(hasattr(runtime_spec, attr) for attr in required_attributes):
         return runtime_spec

     retrieval_service = RetrievalService()
-    budget = getattr(runtime_spec, "direct_injection_budget", None)
+    budget = runtime_spec.direct_injection_budget  # Safe after hasattr check
     resolved_route_mode = retrieval_service.decide_route_mode_for_chat_shell(

As per coding guidelines, "Python: ... type hints required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag.py` around lines 177 - 200, Add
explicit type hints to _finalize_query_runtime_spec (e.g., runtime_spec: Any |
YourRuntimeSpecType, db: Session) and a proper return type matching
runtime_spec; replace the hasattr(...) checks with a single concise existence
check using getattr(runtime_spec, attr, None) for each required attribute (e.g.,
if not all(getattr(runtime_spec, a, None) is not None for a in
required_attributes): return runtime_spec) to avoid redundant hasattr/getattr,
and keep the subsequent budget = getattr(runtime_spec,
"direct_injection_budget", None) and the call to
RetrievalService().decide_route_mode_for_chat_shell(...) as-is before returning
runtime_spec.model_copy(update={"route_mode": resolved_route_mode}).
backend/app/services/rag/embedding/factory.py (1)

221-224: Redundant isinstance checks for custom_headers.

The parameter custom_headers is already typed as dict[str, Any] (line 214), and callers at lines 186 and 202 already ensure it's a dict. The isinstance(custom_headers, dict) checks at lines 222-224 and 267 are redundant.

Proposed simplification
         if (
             custom_headers
-            and isinstance(custom_headers, dict)
             and len(custom_headers) > 0
         ):
-            headers=custom_headers if isinstance(custom_headers, dict) else {},
+            headers=custom_headers,

Also applies to: 267-267

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/embedding/factory.py` around lines 221 - 224, The
conditional branches in embedding factory that check "isinstance(custom_headers,
dict)" are redundant because custom_headers is typed as dict[str, Any] and
callers already validate it; update the conditions that currently read something
like "if custom_headers and isinstance(custom_headers, dict) and
len(custom_headers) > 0" (references: variable custom_headers in the embedding
factory) to a simpler truthy check (e.g., "if custom_headers:" or "if
custom_headers and len(custom_headers) > 0" if explicit emptiness is desired),
and remove the duplicate isinstance check elsewhere (the second occurrence
around the same custom_headers usage) so the logic relies on the type
annotation/call-site guarantees rather than runtime isinstance checks.
backend/app/services/rag/gateway_factory.py (1)

13-17: Consider caching gateway instances.

Each call to get_*_gateway() creates a new gateway instance. If gateways are stateless and instantiation is cheap, this is fine. However, if RemoteRagGateway or LocalRagGateway hold connections or expensive resources, consider caching instances per mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/gateway_factory.py` around lines 13 - 17, The
_build_gateway factory currently creates a new RemoteRagGateway or
LocalRagGateway on each call; change it to cache instances per normalized mode
so repeated calls return the same gateway when appropriate. Implement a
module-level cache (e.g., _GATEWAY_CACHE dict keyed by normalized_mode) and
update _build_gateway to check the cache, create and store the gateway
(RemoteRagGateway or LocalRagGateway) on first request, and return the cached
instance thereafter; if concurrency is a concern, guard cache mutation with a
simple lock (threading.Lock) to make get/create atomic.
backend/app/services/rag/retrieval_service.py (1)

229-242: Clarify the unused query parameter.

The del query statement (line 240) is an unconventional way to suppress "unused parameter" warnings. Consider using an underscore-prefixed name or adding a brief comment explaining why the parameter exists but isn't used.

Alternative approaches

Option 1: Underscore prefix in signature

     def decide_route_mode_for_chat_shell(
         self,
         *,
-        query: str,
+        query: str,  # Reserved for future fine-grained routing; currently unused
         knowledge_base_ids: list[int],

Option 2: Use _ assignment

-        del query
+        _ = query  # Reserved for future use
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/retrieval_service.py` around lines 229 - 242, The
function decide_route_mode_for_chat_shell currently deletes the unused parameter
via "del query"; replace this with a clearer pattern by renaming the parameter
to _query (or prefixing it with an underscore in the signature) or, if renaming
is undesirable, keep the parameter name and add a one-line comment explaining
it's intentionally unused (e.g., "# query kept for API compatibility; not used
here") and remove the del query statement; locate this change in
decide_route_mode_for_chat_shell to ensure signatures and any callers remain
consistent.
knowledge_runtime/knowledge_runtime/api/internal/rag.py (1)

19-40: Instrument internal async endpoints with tracing decorators.

The three route handlers should be traced at function boundaries to keep request-level observability consistent across services.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py` around lines 19 -
40, Add the OpenTelemetry async tracing decorator to each internal route
handler: annotate index_document, query, and delete_document_index with
`@trace_async` using a descriptive span_name (e.g., "internal.rag.index_document",
"internal.rag.query", "internal.rag.delete_document_index"), a tracer_name
(e.g., "internal" or the service tracer), and an extract_attributes callable
that extracts relevant request fields (IDs, query text, tenant info) from the
RemoteIndexRequest/RemoteQueryRequest/RemoteDeleteDocumentIndexRequest; ensure
the decorator is imported and applied directly above each async function
definition so the entire async function is traced.
backend/app/services/rag/content_refs.py (1)

25-63: Add sync tracing decorator on content-ref construction path.

This sync service function is part of the remote indexing path and should be traced at function boundary.

As per coding guidelines, "Use @trace_sync(span_name, tracer_name, extract_attributes) decorator to trace entire sync functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/content_refs.py` around lines 25 - 63, Add the
OpenTelemetry sync tracer to the content-ref construction entrypoint by
decorating build_content_ref_for_attachment with `@trace_sync` using an
appropriate span name (e.g., "build_content_ref_for_attachment"), the backend
tracer name (e.g., "backend"), and an extract_attributes callable that captures
relevant inputs such as attachment_id and expires_delta_seconds; import
trace_sync if missing and ensure the decorator sits immediately above the
build_content_ref_for_attachment definition so the whole function is traced.
knowledge_runtime/knowledge_runtime/services/handlers.py (1)

17-33: Add tracing decorators to async handler methods.

These runtime async handlers should be wrapped with the project tracing decorator for end-to-end observability.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 17 -
33, The async handler methods index_document, query, and delete_document_index
need to be instrumented with the project tracing decorator: add
`@trace_async`(...) above each async def to wrap the entire function; import
trace_async if missing and supply a clear span name (e.g., "index_document",
"query", "delete_document_index"), the appropriate tracer_name used in the
project, and the extract_attributes callable (or a lambda that extracts request
fields like knowledge_base_id, document_id, content_ref.kind) so the decorator
captures request attributes for observability. Ensure the decorator is applied
directly above each method signature in the handlers class and that any required
imports are added.
knowledge_engine/knowledge_engine/storage/base.py (1)

43-46: Narrow exception type to json.JSONDecodeError.

Catching bare Exception can mask unexpected errors. Since we're only handling JSON parse failures, use the specific exception:

       try:
           data = json.loads(stripped)
-      except Exception:
+      except json.JSONDecodeError:
           return raw_content
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/base.py` around lines 43 - 46,
Replace the broad except Exception in the JSON parsing block with a specific
json.JSONDecodeError to avoid masking other errors: in the function/method where
json.loads(stripped) is called (look for variables data, stripped, raw_content
in storage/base.py) catch json.JSONDecodeError and return raw_content as before;
ensure the json module is imported so json.JSONDecodeError is available.
backend/app/api/endpoints/internal/rag_content.py (1)

48-85: Add OpenTelemetry tracing for observability.

Per coding guidelines, async functions should use @trace_async decorator for OpenTelemetry tracing. This internal endpoint would benefit from tracing for debugging RAG content retrieval issues.

+from app.core.tracing import trace_async
+
 `@router.get`("/{attachment_id}")
+@trace_async("stream_rag_attachment_content", "rag_content")
 async def stream_rag_attachment_content(

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag_content.py` around lines 48 - 85, Wrap
the async endpoint function stream_rag_attachment_content with the OpenTelemetry
decorator `@trace_async` using a descriptive span name (e.g.,
"stream_rag_attachment_content"), the appropriate tracer name, and an
extract_attributes callable that captures key attributes such as attachment_id
and context.status; apply this decorator above the function definition (which
currently depends on _verify_rag_download_authorization and get_db) so the full
function execution (including calls to context_service.get_context_optional and
context_service.get_attachment_binary_data and the final Response creation) is
traced for observability.
docker-compose-rag.yml (1)

10-14: Hardcoded credentials should be replaced with environment variable defaults.

While acceptable for local development, these hardcoded passwords could accidentally leak to production. Consider using environment variables with placeholder defaults that force explicit configuration:

   environment:
-    MYSQL_ROOT_PASSWORD: 123456
+    MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD:-changeme}
     MYSQL_DATABASE: task_manager
     MYSQL_USER: task_user
-    MYSQL_PASSWORD: task_password
+    MYSQL_PASSWORD: ${MYSQL_PASSWORD:-changeme}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker-compose-rag.yml` around lines 10 - 14, Replace the hardcoded DB
credentials under the environment block by reading from environment variables
with safe placeholder defaults so they aren’t baked into the compose file;
change the values for MYSQL_ROOT_PASSWORD, MYSQL_DATABASE, MYSQL_USER and
MYSQL_PASSWORD to reference env vars (e.g., use
${MYSQL_ROOT_PASSWORD:-<placeholder>} style) and update any documentation or
.env.example to require explicit configuration for these variables, ensuring
unique symbols MYSQL_ROOT_PASSWORD, MYSQL_DATABASE, MYSQL_USER and
MYSQL_PASSWORD are used when locating the entries to modify.
knowledge_engine/knowledge_engine/index/indexer.py (1)

45-81: Add concrete types to the new indexing boundary.

DocumentIndexer is a new cross-module service surface, but embed_model, **kwargs, Dict, and bare List keep most of the contract dynamic. Tightening these signatures now will make the new knowledge_engine API much safer to evolve and easier to validate in callers/tests.

As per coding guidelines, "Python: Follow PEP 8, Black formatter (line length: 88), isort, type hints required".

Also applies to: 110-115, 162-195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/index/indexer.py` around lines 45 - 81,
DocumentIndexer exposes dynamic types; update the public signatures to use
concrete type hints: replace embed_model with a specific interface type (e.g.,
EmbeddingModel or EmbeddingModelProtocol), change return types from bare
Dict/List to dict[str, Any] or list[Document] as appropriate, and make **kwargs
explicit (e.g., *, namespace: str | None = None, batch_size: int | None = None)
in index_document, index_from_binary and _index_documents to document allowed
options; import typing names (dict, list, Any, Optional) and the
Document/EmbeddingModel types and apply the same concrete typing changes
referenced around the other occurrences (near the noted 110-115 and 162-195
regions) so callers/tests have strict, discoverable contracts.
backend/app/services/rag/remote_gateway.py (1)

105-178: Add tracing decorators to gateway operations.

The new remote index/query/delete paths are currently untraced; this will make runtime-mode debugging and latency attribution harder.

As per coding guidelines "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry" and "Use add_span_event(name, attributes) to add event to current OpenTelemetry span".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 105 - 178, The three
async methods index_document, query, and delete_document_index lack
OpenTelemetry tracing; wrap each with the `@trace_async`(...) decorator (use a
descriptive span_name like
"rag.remote.index_document"/"rag.remote.query"/"rag.remote.delete_document_index"
and the appropriate tracer_name and extract_attributes callable) and add span
events via add_span_event(...) around key operations (e.g., before/after
building payload and before/after calling self._post_model) including attributes
such as knowledge_base_id, document_id/document_ref, index_owner_user_id, and
user_name to surface payload and latency details for _post_model calls; ensure
you import and use trace_async and add_span_event consistently for all three
methods (index_document, query, delete_document_index).
knowledge_engine/knowledge_engine/services/document_service.py (1)

20-93: Instrument service methods with tracing decorators.

Async and sync entrypoints here are not decorated for OpenTelemetry tracing, so this new indexing path will be under-observed in production.

As per coding guidelines "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry" and "Use @trace_sync(span_name, tracer_name, extract_attributes) decorator to trace entire sync functions in OpenTelemetry".

Also applies to: 94-196

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/services/document_service.py` around lines
20 - 93, Add OpenTelemetry tracing decorators to the async entrypoints and their
sync counterparts: annotate async methods index_document_from_binary,
index_document_from_file, delete_document, list_documents (and other async
entrypoints in lines 94-196) with `@trace_async` using an appropriate span_name
(e.g., "index_document_from_binary", "index_document_from_file",
"delete_document", "list_documents"), a consistent tracer_name (e.g.,
"knowledge_engine.document_service"), and the existing extract_attributes
callable; likewise annotate the sync implementations
_index_document_from_binary_sync and _index_document_from_file_sync (and other
sync helpers) with `@trace_sync` using matching
span_name/tracer_name/extract_attributes so the full sync+async call path is
captured. Ensure imports for trace_async/trace_sync are added and that the
extract_attributes you pass extracts relevant parameters (knowledge_id, user_id,
document_id, source_file/file_path) for each span.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 00153d5c-f6e2-4f99-919a-857ee854b2c8

📥 Commits

Reviewing files that changed from the base of the PR and between a83c425 and 599fe6b.

⛔ Files ignored due to path filters (2)
  • backend/uv.lock is excluded by !**/*.lock
  • knowledge_engine/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (75)
  • backend/app/api/api.py
  • backend/app/api/endpoints/internal/__init__.py
  • backend/app/api/endpoints/internal/rag.py
  • backend/app/api/endpoints/internal/rag_content.py
  • backend/app/core/config.py
  • backend/app/services/auth/__init__.py
  • backend/app/services/auth/rag_download_token.py
  • backend/app/services/knowledge/indexing.py
  • backend/app/services/knowledge/knowledge_service.py
  • backend/app/services/rag/content_refs.py
  • backend/app/services/rag/embedding/factory.py
  • backend/app/services/rag/gateway.py
  • backend/app/services/rag/gateway_factory.py
  • backend/app/services/rag/local_data_plane/indexing.py
  • backend/app/services/rag/local_gateway.py
  • backend/app/services/rag/remote_gateway.py
  • backend/app/services/rag/retrieval_service.py
  • backend/app/services/rag/runtime_resolver.py
  • backend/app/services/rag/runtime_specs.py
  • backend/app/services/rag/storage/factory.py
  • backend/pyproject.toml
  • backend/tests/api/endpoints/internal/test_rag_content_endpoint.py
  • backend/tests/api/endpoints/internal/test_rag_retrieve_endpoint.py
  • backend/tests/core/test_config.py
  • backend/tests/integration/test_rag_remote_mode.py
  • backend/tests/services/knowledge/test_indexing.py
  • backend/tests/services/rag/test_content_refs.py
  • backend/tests/services/rag/test_local_data_plane_indexing.py
  • backend/tests/services/rag/test_local_gateway.py
  • backend/tests/services/rag/test_remote_gateway.py
  • backend/tests/services/rag/test_retrieval_service.py
  • backend/tests/services/rag/test_runtime_resolver.py
  • backend/tests/services/rag/test_runtime_specs.py
  • backend/tests/services/test_summary_service.py
  • backend/tests/tasks/test_knowledge_tasks.py
  • docker-compose-rag.yml
  • docker/knowledge_runtime/Dockerfile
  • docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md
  • docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md
  • knowledge_engine/knowledge_engine/__init__.py
  • knowledge_engine/knowledge_engine/embedding/__init__.py
  • knowledge_engine/knowledge_engine/index/__init__.py
  • knowledge_engine/knowledge_engine/index/indexer.py
  • knowledge_engine/knowledge_engine/query/__init__.py
  • knowledge_engine/knowledge_engine/services/__init__.py
  • knowledge_engine/knowledge_engine/services/document_service.py
  • knowledge_engine/knowledge_engine/splitter/__init__.py
  • knowledge_engine/knowledge_engine/splitter/config.py
  • knowledge_engine/knowledge_engine/splitter/factory.py
  • knowledge_engine/knowledge_engine/splitter/smart.py
  • knowledge_engine/knowledge_engine/splitter/splitter.py
  • knowledge_engine/knowledge_engine/storage/__init__.py
  • knowledge_engine/knowledge_engine/storage/base.py
  • knowledge_engine/knowledge_engine/storage/chunk_metadata.py
  • knowledge_engine/pyproject.toml
  • knowledge_engine/tests/test_document_service.py
  • knowledge_engine/tests/test_imports.py
  • knowledge_runtime/knowledge_runtime/__init__.py
  • knowledge_runtime/knowledge_runtime/api/health.py
  • knowledge_runtime/knowledge_runtime/api/internal/__init__.py
  • knowledge_runtime/knowledge_runtime/api/internal/rag.py
  • knowledge_runtime/knowledge_runtime/api/router.py
  • knowledge_runtime/knowledge_runtime/core/config.py
  • knowledge_runtime/knowledge_runtime/main.py
  • knowledge_runtime/knowledge_runtime/security.py
  • knowledge_runtime/knowledge_runtime/services/__init__.py
  • knowledge_runtime/knowledge_runtime/services/content_fetcher.py
  • knowledge_runtime/knowledge_runtime/services/handlers.py
  • knowledge_runtime/pyproject.toml
  • knowledge_runtime/start.sh
  • knowledge_runtime/tests/test_health.py
  • knowledge_runtime/tests/test_internal_rag.py
  • shared/models/__init__.py
  • shared/models/knowledge_runtime_protocol.py
  • shared/tests/test_knowledge_runtime_protocol.py

Comment on lines 148 to 161
async def delete_document_index_local(
knowledge_base_id: int,
document_ref: str,
spec: DeleteRuntimeSpec,
*,
db: Session,
index_owner_user_id: int | None = None,
) -> dict:
"""Delete document chunks from the local storage backend."""
kb = (
db.query(Kind)
.filter(
Kind.id == knowledge_base_id,
Kind.kind == "KnowledgeBase",
Kind.is_active,
)
.first()
del db
storage_backend = create_storage_backend_from_runtime_config(spec.retriever_config)
service = EngineDocumentService(storage_backend=storage_backend)
return await service.delete_document(
knowledge_id=str(spec.knowledge_base_id),
doc_ref=spec.document_ref,
user_id=spec.index_owner_user_id,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Local delete currently ignores enabled_index_families.

DeleteRuntimeSpec.enabled_index_families is resolved upstream, but this path never forwards it to the engine service. That means local and remote runtime modes can delete different index families for the same request. Please thread spec.enabled_index_families through the delete call, or remove the field until both modes honor the same contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/local_data_plane/indexing.py` around lines 148 -
161, The local delete implementation in delete_document_index_local ignores
DeleteRuntimeSpec.enabled_index_families; update the call to
EngineDocumentService.delete_document to forward spec.enabled_index_families so
both local and remote modes honor the same contract. Locate
delete_document_index_local and the call to service.delete_document and add the
enabled_index_families (or the exact parameter name expected by
EngineDocumentService.delete_document) argument using the value from
spec.enabled_index_families; ensure any type/None handling matches the engine
method signature.

Comment on lines +56 to +62
self._base_url = (base_url or settings.KNOWLEDGE_RUNTIME_URL).rstrip("/")
self._token = token if token is not None else settings.INTERNAL_SERVICE_TOKEN
self._timeout = timeout

def _build_headers(self) -> dict[str, str]:
return {"Authorization": f"Bearer {self._token}"}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when remote token is missing.

If INTERNAL_SERVICE_TOKEN is unset, Line 61 emits Authorization: Bearer None, which turns a config error into opaque auth failures. Validate token in __init__ and raise immediately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 56 - 62, The
constructor currently assigns self._token from token or
settings.INTERNAL_SERVICE_TOKEN but doesn't validate it, leading _build_headers
to emit "Authorization: Bearer None"; update the __init__ of the RemoteGateway
(or the class containing self._token) to validate that the resolved token is a
non-empty string and raise a clear ValueError (or custom exception) immediately
if missing, and keep _build_headers returning {"Authorization": f"Bearer
{self._token}"} so later code never sees a None token.

Comment on lines +63 to +74
async def _post_model(self, path: str, payload: Any) -> dict[str, Any]:
async with httpx.AsyncClient(timeout=self._timeout) as client:
response = await client.post(
f"{self._base_url}{path}",
headers=self._build_headers(),
json=payload.model_dump(mode="json", exclude_none=True),
)

if response.is_error:
self._raise_remote_error(response)
return response.json()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify fallback catches and exception normalization points.
rg -n -C3 'except\s+RemoteRagGatewayError|RemoteQueryResponse\.model_validate\(|httpx\.(RequestError|HTTPError)' backend/app/services/rag backend/app/api/endpoints/internal

Repository: wecode-ai/Wegent

Length of output: 1366


🏁 Script executed:

#!/bin/bash
# Retrieve full _post_model method and surrounding context
fd -type f -name 'remote_gateway.py' backend/app/services/rag

Repository: wecode-ai/Wegent

Length of output: 230


🏁 Script executed:

#!/bin/bash
# Get line count to determine safe read strategy
wc -l backend/app/services/rag/remote_gateway.py

Repository: wecode-ai/Wegent

Length of output: 106


🏁 Script executed:

#!/bin/bash
cat -n backend/app/services/rag/remote_gateway.py

Repository: wecode-ai/Wegent

Length of output: 7260


🏁 Script executed:

#!/bin/bash
# Search for all callers of query() and _post_model() methods
rg -n '\.query\(|\.\_post_model\(' backend/app --type py

Repository: wecode-ai/Wegent

Length of output: 164


🏁 Script executed:

#!/bin/bash
# Search for all callers of query() and _post_model() methods
rg -n '\.query\(|_post_model\(' backend/app --type py

Repository: wecode-ai/Wegent

Length of output: 50372


🏁 Script executed:

#!/bin/bash
# Search specifically for RemoteRagGateway.query or rag_gateway.query calls
rg -n 'rag_gateway\.query|RemoteRagGateway' backend/app/api/endpoints/internal/rag.py -A 3 -B 3

Repository: wecode-ai/Wegent

Length of output: 855


Normalize all remote failures to RemoteRagGatewayError.

Line 64–69 (_post_model) can raise unhandled httpx transport exceptions (e.g., RequestError, ConnectError, TimeoutError), and line 158 (query) can raise ValidationError from schema validation. Both bypass the gateway-level error contract; the caller in backend/app/api/endpoints/internal/rag.py:207 catches only RemoteRagGatewayError to trigger remote→local fallback. These unhandled exceptions break the fallback mechanism and crash the operation instead.

Wrap both sites in try-except and convert to RemoteRagGatewayError:

🔧 Suggested patch
 async def _post_model(self, path: str, payload: Any) -> dict[str, Any]:
-    async with httpx.AsyncClient(timeout=self._timeout) as client:
-        response = await client.post(
-            f"{self._base_url}{path}",
-            headers=self._build_headers(),
-            json=payload.model_dump(mode="json", exclude_none=True),
-        )
+    try:
+        async with httpx.AsyncClient(timeout=self._timeout) as client:
+            response = await client.post(
+                f"{self._base_url}{path}",
+                headers=self._build_headers(),
+                json=payload.model_dump(mode="json", exclude_none=True),
+            )
+    except httpx.RequestError as exc:
+        raise RemoteRagGatewayError(
+            f"knowledge_runtime request failed: {exc}",
+            code="remote_request_failed",
+            retryable=True,
+        ) from exc

     if response.is_error:
         self._raise_remote_error(response)
     return response.json()
@@
     response_payload = await self._post_model("/internal/rag/query", payload)
-    response = RemoteQueryResponse.model_validate(response_payload)
+    try:
+        response = RemoteQueryResponse.model_validate(response_payload)
+    except ValidationError as exc:
+        raise RemoteRagGatewayError(
+            "knowledge_runtime returned invalid query payload",
+            code="remote_response_invalid",
+            status_code=502,
+            details={"validation_errors": exc.errors()},
+        ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 63 - 74, The
_post_model and query code paths must normalize all remote-related exceptions
into RemoteRagGatewayError so the caller can trigger fallback; wrap the body of
async def _post_model(self, path: str, payload: Any) (the httpx.AsyncClient.post
call and response handling) in a try/except that catches httpx.RequestError (and
other httpx transport/errors) and re-raises RemoteRagGatewayError (including the
original exception message), while keeping response.is_error logic that calls
_raise_remote_error; likewise wrap the logic in query (the schema validation and
any downstream remote calls that raise pydantic.ValidationError or httpx errors)
in a try/except that catches ValidationError and httpx transport exceptions and
converts them into RemoteRagGatewayError carrying the original exception
details. Ensure you preserve existing calls to _raise_remote_error for HTTP
response errors and attach original exception info when constructing
RemoteRagGatewayError.

Comment thread backend/app/services/rag/runtime_resolver.py
Comment on lines +342 to +372
def _get_model_kind(
self,
*,
db: Session,
user_id: int,
model_name: str,
model_namespace: str,
):
if model_namespace == "default":
return (
db.query(Kind)
.filter(
Kind.kind == "Model",
Kind.name == model_name,
Kind.namespace == model_namespace,
Kind.is_active == True,
)
.filter((Kind.user_id == user_id) | (Kind.user_id == 0))
.order_by(Kind.user_id.desc())
.first()
)
return (
db.query(Kind)
.filter(
Kind.kind == "Model",
Kind.name == model_name,
Kind.namespace == model_namespace,
Kind.is_active == True,
)
.first()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Scope non-default model lookups by user_id too.

The non-default branch drops the user_id predicate entirely. If two tenants define the same namespace/name, this can resolve the wrong model and forward that tenant’s decrypted API key / headers into the runtime config. Keep the tenant/shared-model filter in both branches.

As per coding guidelines, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

🧰 Tools
🪛 Ruff (0.15.9)

[error] 357-357: Avoid equality comparisons to True; use Kind.is_active: for truth checks

Replace with Kind.is_active

(E712)


[error] 369-369: Avoid equality comparisons to True; use Kind.is_active: for truth checks

Replace with Kind.is_active

(E712)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 342 - 372, The
non-default branch of _get_model_kind currently omits the user_id predicate,
risking cross-tenant lookup; update the second return to include the same
tenant/shared filter used in the default branch: add .filter((Kind.user_id ==
user_id) | (Kind.user_id == 0)) and .order_by(Kind.user_id.desc()) before
.first() so the Kind query always uses namespace, name, and user_id to resolve
the correct Model Kind.

Comment thread knowledge_runtime/knowledge_runtime/core/config.py Outdated
Comment on lines +17 to +22
token = authorization[7:].strip()
if token != settings.INTERNAL_SERVICE_TOKEN:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid internal service token",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against empty internal token configuration (auth bypass risk).

If INTERNAL_SERVICE_TOKEN is empty, a blank bearer token can authenticate. Reject startup/runtime auth checks when the configured token is empty.

🔒 Suggested fix
 def verify_internal_service_token(authorization: str = Header(default="")) -> None:
+    if not settings.INTERNAL_SERVICE_TOKEN:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail="Internal service token is not configured",
+        )
+
     if not authorization.startswith("Bearer "):
         raise HTTPException(
             status_code=status.HTTP_401_UNAUTHORIZED,
             detail="Missing or invalid Authorization header",
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
token = authorization[7:].strip()
if token != settings.INTERNAL_SERVICE_TOKEN:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid internal service token",
)
def verify_internal_service_token(authorization: str = Header(default="")) -> None:
if not settings.INTERNAL_SERVICE_TOKEN:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Internal service token is not configured",
)
if not authorization.startswith("Bearer "):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Missing or invalid Authorization header",
)
token = authorization[7:].strip()
if token != settings.INTERNAL_SERVICE_TOKEN:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid internal service token",
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/security.py` around lines 17 - 22, The
current auth check slices the incoming bearer token into token and compares it
to settings.INTERNAL_SERVICE_TOKEN but does not guard against an empty
configured token; update the code around the token extraction (the token
variable and the comparison to settings.INTERNAL_SERVICE_TOKEN) to first assert
that settings.INTERNAL_SERVICE_TOKEN is non-empty (e.g., if not
settings.INTERNAL_SERVICE_TOKEN: raise an HTTPException indicating
misconfiguration/startup error), and then proceed to compare the incoming token
to that value and raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid internal service token") when they don’t match; this ensures an
empty INTERNAL_SERVICE_TOKEN cannot be used to bypass authentication.

Comment on lines +19 to +23
if content_ref.kind == "backend_attachment_stream":
headers["Authorization"] = f"Bearer {content_ref.auth_token}"

async with httpx.AsyncClient() as client:
response = await client.get(content_ref.url, headers=headers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Definitions / builders for backend attachment refs:"
rg -n -C4 --type=py 'BackendAttachmentStreamContentRef|backend_attachment_stream'

echo
echo "Any host/origin validation before auth_token is forwarded:"
rg -n -C4 --type=py 'urlsplit|urlparse|netloc|allowlist|trusted.*(host|origin)|BACKEND_.*(URL|HOST)'

Repository: wecode-ai/Wegent

Length of output: 50372


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== How knowledge_runtime receives RemoteIndexRequest ==="
rg -n 'RemoteIndexRequest|content_ref' knowledge_runtime/ --type=py -A 3 -B 3 | head -150

echo
echo "=== Knowledge runtime route/endpoint definitions ==="
rg -n 'router|@app|def.*index|RemoteIndexRequest' knowledge_runtime/knowledge_runtime/ --type=py | grep -E '(router|@|def )' | head -40

echo
echo "=== Check if content_ref can be passed as external input ==="
rg -n 'RemoteIndexRequest.*model_validate|from_dict|parse_obj' knowledge_runtime/ --type=py -A 2

Repository: wecode-ai/Wegent

Length of output: 8944


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== verify_internal_service_token implementation ==="
rg -n 'verify_internal_service_token|def verify' knowledge_runtime/ --type=py -A 10

echo
echo "=== How does the backend build and send RemoteIndexRequest? ==="
rg -n 'RemoteIndexRequest|backend_attachment_stream' backend/app/ --type=py -B 2 -A 2 | head -60

echo
echo "=== Search for settings.BACKEND_INTERNAL_URL usage ==="
rg -n 'BACKEND_INTERNAL_URL' backend/ --type=py | head -10

Repository: wecode-ai/Wegent

Length of output: 5857


Validate the backend origin before forwarding the download token.

The code sends content_ref.auth_token to any URL in content_ref.url without checking that it belongs to the backend. While the builder in backend/app/services/rag/content_refs.py hard-pins the URL to settings.BACKEND_INTERNAL_URL, knowledge_runtime itself has no such validation. If INTERNAL_SERVICE_TOKEN is leaked or an internal caller is compromised, this allows credential exfiltration to an arbitrary host.

Add an origin check in fetch_content() (around line 19) to ensure backend_attachment_stream URLs match the expected backend origin.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/content_fetcher.py` around lines
19 - 23, In fetch_content(), avoid unconditionally forwarding
content_ref.auth_token to arbitrary URLs: before setting
headers["Authorization"] when content_ref.kind == "backend_attachment_stream",
parse content_ref.url and verify its origin (scheme + host + optional port)
matches the expected backend origin (use settings.BACKEND_INTERNAL_URL or a
derived origin constant) and only then attach the Authorization header; if the
origin does not match, do not include the token and raise/return an error (or
log and abort) to prevent credential exfiltration. Ensure you reference
content_ref.kind, content_ref.url, content_ref.auth_token and
settings.BACKEND_INTERNAL_URL in the check.

Comment on lines +17 to +39
async def index_document(self, request: RemoteIndexRequest) -> dict:
content = await content_fetcher.fetch_content(request.content_ref)
return {
"status": "accepted",
"knowledge_id": str(request.knowledge_base_id),
"document_id": request.document_id,
"content_bytes": len(content),
"content_ref_kind": request.content_ref.kind,
"index_families": request.index_families,
}

async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0)

async def delete_document_index(
self, request: RemoteDeleteDocumentIndexRequest
) -> dict:
return {
"status": "accepted",
"knowledge_id": str(request.knowledge_base_id),
"doc_ref": request.document_ref,
"enabled_index_families": request.enabled_index_families,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning “accepted” from no-op runtime operations.

Line 17 through Line 39 acknowledge index/query/delete without executing engine operations. In remote mode, this can report success while doing no real work.

🛠️ Fail-fast alternative until engine wiring is complete
 class RuntimeHandlers:
     async def index_document(self, request: RemoteIndexRequest) -> dict:
-        content = await content_fetcher.fetch_content(request.content_ref)
-        return {
-            "status": "accepted",
-            "knowledge_id": str(request.knowledge_base_id),
-            "document_id": request.document_id,
-            "content_bytes": len(content),
-            "content_ref_kind": request.content_ref.kind,
-            "index_families": request.index_families,
-        }
+        raise NotImplementedError("index_document is not wired to knowledge_engine yet")

     async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
-        return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0)
+        raise NotImplementedError("query is not wired to knowledge_engine yet")

     async def delete_document_index(
         self, request: RemoteDeleteDocumentIndexRequest
     ) -> dict:
-        return {
-            "status": "accepted",
-            "knowledge_id": str(request.knowledge_base_id),
-            "doc_ref": request.document_ref,
-            "enabled_index_families": request.enabled_index_families,
-        }
+        raise NotImplementedError("delete_document_index is not wired to knowledge_engine yet")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def index_document(self, request: RemoteIndexRequest) -> dict:
content = await content_fetcher.fetch_content(request.content_ref)
return {
"status": "accepted",
"knowledge_id": str(request.knowledge_base_id),
"document_id": request.document_id,
"content_bytes": len(content),
"content_ref_kind": request.content_ref.kind,
"index_families": request.index_families,
}
async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
return RemoteQueryResponse(records=[], total=0, total_estimated_tokens=0)
async def delete_document_index(
self, request: RemoteDeleteDocumentIndexRequest
) -> dict:
return {
"status": "accepted",
"knowledge_id": str(request.knowledge_base_id),
"doc_ref": request.document_ref,
"enabled_index_families": request.enabled_index_families,
}
async def index_document(self, request: RemoteIndexRequest) -> dict:
raise NotImplementedError("index_document is not wired to knowledge_engine yet")
async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
raise NotImplementedError("query is not wired to knowledge_engine yet")
async def delete_document_index(
self, request: RemoteDeleteDocumentIndexRequest
) -> dict:
raise NotImplementedError("delete_document_index is not wired to knowledge_engine yet")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 17 -
39, The current handlers index_document, query, and delete_document_index return
"accepted" without performing engine work; change them to fail-fast while the
engine is unwired by either raising NotImplementedError (with a clear message
containing request.document_id/request.knowledge_base_id where applicable) or
returning an explicit error payload (e.g., status: "not_implemented" and reason
including the request identifiers) from index_document, query
(RemoteQueryResponse should indicate error or raise), and delete_document_index
so we do not report success for no-op operations.

Comment on lines +21 to +27
class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel):
"""Content reference resolved by streaming through Backend."""

kind: Literal["backend_attachment_stream"]
url: str
auth_token: str
expires_at: datetime | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

cat -n shared/models/knowledge_runtime_protocol.py | head -80

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

     1	# SPDX-FileCopyrightText: 2025 Weibo, Inc.
     2	#
     3	# SPDX-License-Identifier: Apache-2.0
     4	
     5	"""Lightweight transport models for Backend <-> knowledge_runtime."""
     6	
     7	from __future__ import annotations
     8	
     9	from datetime import datetime
    10	from typing import Annotated, Any, Literal
    11	
    12	from pydantic import BaseModel, ConfigDict, Field
    13	
    14	
    15	class KnowledgeRuntimeProtocolModel(BaseModel):
    16	    """Base protocol model with strict field validation."""
    17	
    18	    model_config = ConfigDict(extra="forbid")
    19	
    20	
    21	class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel):
    22	    """Content reference resolved by streaming through Backend."""
    23	
    24	    kind: Literal["backend_attachment_stream"]
    25	    url: str
    26	    auth_token: str
    27	    expires_at: datetime | None = None
    28	
    29	
    30	class PresignedUrlContentRef(KnowledgeRuntimeProtocolModel):
    31	    """Content reference resolved directly from object storage."""
    32	
    33	    kind: Literal["presigned_url"]
    34	    url: str
    35	    expires_at: datetime | None = None
    36	
    37	
    38	ContentRef = Annotated[
    39	    BackendAttachmentStreamContentRef | PresignedUrlContentRef,
    40	    Field(discriminator="kind"),
    41	]
    42	
    43	RetrievalPolicy = Literal[
    44	    "chunk_only",
    45	    "summary_first",
    46	    "summary_then_chunk_expand",
    47	    "hybrid",
    48	]
    49	
    50	RetrievalMode = Literal["vector", "keyword", "hybrid"]
    51	
    52	
    53	class KnowledgeRuntimeAuth(KnowledgeRuntimeProtocolModel):
    54	    """Simple internal auth carrier for the runtime service."""
    55	
    56	    scheme: Literal["bearer"] = "bearer"
    57	    token: str
    58	
    59	
    60	class RemoteRagError(KnowledgeRuntimeProtocolModel):
    61	    """Standardized remote error payload."""
    62	
    63	    code: str
    64	    message: str
    65	    retryable: bool = False
    66	    details: dict[str, Any] | None = None
    67	
    68	
    69	class RuntimeRetrieverConfig(KnowledgeRuntimeProtocolModel):
    70	    """Resolved retriever identity and storage configuration."""
    71	
    72	    name: str
    73	    namespace: str = "default"
    74	    storage_config: dict[str, Any] = Field(default_factory=dict)
    75	
    76	
    77	class RuntimeEmbeddingModelConfig(KnowledgeRuntimeProtocolModel):
    78	    """Resolved embedding model configuration."""
    79	
    80	    model_name: str

Script executed:

# Check for SecretStr imports or similar masking mechanisms
rg -i "secretstr|secret" shared/models/knowledge_runtime_protocol.py

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 1


Script executed:

# Search for logging configurations that might expose model data
rg -r "logging|logger|log\(" shared/models/knowledge_runtime_protocol.py -A 3

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md:- `logging|logger|log\(`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- `shared/models/__init__.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- `shared/tests/test_knowledge_runtime_protocol.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-
--
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md:- Modify: `logging|logger|log\(`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `shared/models/__init__.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `shared/tests/test_knowledge_runtime_protocol.py`
docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md-- Modify: `backend/app/services/rag/runtime_specs.py`

Script executed:

# Check for custom __repr__ or __str__ overrides
ast-grep --pattern "def __repr__($$$) { $$$ }" shared/models/knowledge_runtime_protocol.py

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 1


Script executed:

# Search for pydantic config related to serialization
rg "class Config|model_config|json_schema" shared/models/knowledge_runtime_protocol.py -A 5

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

    model_config = ConfigDict(extra="forbid")


class BackendAttachmentStreamContentRef(KnowledgeRuntimeProtocolModel):
    """Content reference resolved by streaming through Backend."""

--
    embedding_model_config: RuntimeEmbeddingModelConfig
    retrieval_config: RuntimeRetrievalConfig


class RemoteIndexRequest(KnowledgeRuntimeProtocolModel):
    """Index request sent from Backend to knowledge_runtime."""
--
    embedding_model_config: RuntimeEmbeddingModelConfig
    splitter_config: dict[str, Any] | None = None
    index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
    content_ref: ContentRef
    trace_context: dict[str, Any] | None = None
    user_name: str | None = None

Script executed:

# Find files that import or use these models
rg "BackendAttachmentStreamContentRef|KnowledgeRuntimeAuth" --type py -l

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

shared/tests/test_knowledge_runtime_protocol.py
knowledge_runtime/knowledge_runtime/services/content_fetcher.py
shared/models/knowledge_runtime_protocol.py
shared/models/__init__.py
knowledge_runtime/tests/test_internal_rag.py
backend/app/services/rag/content_refs.py

Script executed:

# Search for serialization patterns and HTTP calls using these models
rg "model_dump|json|serialize" shared/models/knowledge_runtime_protocol.py -A 3 -B 3

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 1


Script executed:

# Check if there are any Field() definitions with exclude or other masking
rg "Field\(" shared/models/knowledge_runtime_protocol.py

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

    Field(discriminator="kind"),
    storage_config: dict[str, Any] = Field(default_factory=dict)
    resolved_config: dict[str, Any] = Field(default_factory=dict)
    index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
    enabled_index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])
    knowledge_base_configs: list[RemoteKnowledgeBaseQueryConfig] = Field(
    enabled_index_families: list[str] = Field(default_factory=lambda: ["chunk_vector"])

Script executed:

# Look for logging calls in knowledge_runtime related files
find . -type f -name "*.py" -path "*/knowledge*" | xargs rg "logger|logging\..*\(.*auth_token|token\)" 2>/dev/null | head -20

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

./knowledge_runtime/knowledge_runtime/api/internal/rag.py:    _: None = Depends(verify_internal_service_token),
./knowledge_runtime/knowledge_runtime/api/internal/rag.py:    _: None = Depends(verify_internal_service_token),
./knowledge_runtime/knowledge_runtime/api/internal/rag.py:    _: None = Depends(verify_internal_service_token),
./knowledge_engine/knowledge_engine/storage/base.py:logger = logging.getLogger(__name__)
./knowledge_engine/knowledge_engine/index/indexer.py:logger = logging.getLogger(__name__)
./knowledge_engine/knowledge_engine/index/indexer.py:                logger.warning(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:logger = logging.getLogger(__name__)
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:        logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:        logger.debug(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:        logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:        logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:        logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:            logger.info(
./chat_shell/chat_shell/tools/knowledge_injection_strategy.py:            logger.info(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:logger = logging.getLogger(__name__)
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:            logger.debug(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:            logger.info(
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:        original_tokens = int(len(content) / chars_per_token)
./chat_shell/chat_shell/tools/knowledge_content_cleaner.py:        cleaned_tokens = int(len(cleaned_content) / chars_per_token)
./chat_shell/chat_shell/tools/knowledge_factory.py:logger = logging.getLogger(__name__)

Script executed:

# Check test files for expected behavior
find . -type f -name "*knowledge_runtime*test*.py" -o -name "test_*knowledge_runtime*.py" | head -5

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

./shared/tests/test_knowledge_runtime_protocol.py

Script executed:

# Check how BackendAttachmentStreamContentRef.auth_token is used
cat -n backend/app/services/rag/content_refs.py | grep -A 10 -B 10 "auth_token\|BackendAttachmentStreamContentRef"

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

     8	
     9	from datetime import datetime, timedelta, timezone
    10	
    11	from sqlalchemy.orm import Session
    12	
    13	from app.core.config import settings
    14	from app.models.subtask_context import ContextType
    15	from app.services.auth.rag_download_token import create_rag_download_token
    16	from app.services.context import context_service
    17	from shared.models import (
    18	    BackendAttachmentStreamContentRef,
    19	    PresignedUrlContentRef,
    20	)
    21	
    22	DEFAULT_CONTENT_REF_TTL_SECONDS = 300
    23	
    24	
    25	def build_content_ref_for_attachment(
    26	    *,
    27	    db: Session,
    28	    attachment_id: int,
    29	    expires_delta_seconds: int = DEFAULT_CONTENT_REF_TTL_SECONDS,
    30	) -> BackendAttachmentStreamContentRef | PresignedUrlContentRef:
    31	    """Build a content reference for an attachment used by remote indexing."""
    32	
    33	    context = context_service.get_context_optional(
    34	        db=db,
    35	        context_id=attachment_id,
    36	    )
    37	    if context is None or context.context_type != ContextType.ATTACHMENT.value:
    38	        raise ValueError(f"Attachment {attachment_id} not found")
    39	
    40	    expires_at = datetime.now(timezone.utc) + timedelta(seconds=expires_delta_seconds)
--
    43	        context=context,
    44	        expires=expires_delta_seconds,
    45	    )
    46	    if presigned_url:
    47	        return PresignedUrlContentRef(
    48	            kind="presigned_url",
    49	            url=presigned_url,
    50	            expires_at=expires_at,
    51	        )
    52	
    53	    auth_token = create_rag_download_token(
    54	        attachment_id=attachment_id,
    55	        expires_delta_seconds=expires_delta_seconds,
    56	    )
    57	    base_url = settings.BACKEND_INTERNAL_URL.rstrip("/")
    58	    return BackendAttachmentStreamContentRef(
    59	        kind="backend_attachment_stream",
    60	        url=f"{base_url}{settings.API_PREFIX}/internal/rag/content/{attachment_id}",
    61	        auth_token=auth_token,
    62	        expires_at=expires_at,
    63	    )

Script executed:

# Check knowledge_runtime content_fetcher usage
cat -n knowledge_runtime/knowledge_runtime/services/content_fetcher.py | head -100

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

     1	# SPDX-FileCopyrightText: 2026 Weibo, Inc.
     2	#
     3	# SPDX-License-Identifier: Apache-2.0
     4	
     5	from __future__ import annotations
     6	
     7	import httpx
     8	
     9	from shared.models.knowledge_runtime_protocol import (
    10	    BackendAttachmentStreamContentRef,
    11	    PresignedUrlContentRef,
    12	)
    13	
    14	
    15	async def fetch_content(
    16	    content_ref: BackendAttachmentStreamContentRef | PresignedUrlContentRef,
    17	) -> bytes:
    18	    headers = {}
    19	    if content_ref.kind == "backend_attachment_stream":
    20	        headers["Authorization"] = f"Bearer {content_ref.auth_token}"
    21	
    22	    async with httpx.AsyncClient() as client:
    23	        response = await client.get(content_ref.url, headers=headers)
    24	        response.raise_for_status()
    25	        return response.content

Script executed:

# Check test file to see expected behavior
cat -n shared/tests/test_knowledge_runtime_protocol.py

Repository: wecode-ai/Wegent


Repository: wecode-ai/Wegent
Exit code: 0

stdout:

     1	# SPDX-FileCopyrightText: 2025 Weibo, Inc.
     2	#
     3	# SPDX-License-Identifier: Apache-2.0
     4	
     5	from __future__ import annotations
     6	
     7	import pytest
     8	from pydantic import ValidationError
     9	
    10	import shared.models as shared_models
    11	
    12	
    13	def _require_model(name: str):
    14	    model = getattr(shared_models, name, None)
    15	    if model is None:
    16	        pytest.fail(f"shared.models must export {name}")
    17	    return model
    18	
    19	
    20	def test_shared_models_exports_knowledge_runtime_protocol_types() -> None:
    21	    exported_names = [
    22	        "BackendAttachmentStreamContentRef",
    23	        "PresignedUrlContentRef",
    24	        "RuntimeEmbeddingModelConfig",
    25	        "RuntimeRetrievalConfig",
    26	        "RuntimeRetrieverConfig",
    27	        "RemoteKnowledgeBaseQueryConfig",
    28	        "RemoteDeleteDocumentIndexRequest",
    29	        "RemoteIndexRequest",
    30	        "RemoteQueryRequest",
    31	        "RemoteQueryRecord",
    32	        "RemoteQueryResponse",
    33	    ]
    34	
    35	    for name in exported_names:
    36	        assert getattr(shared_models, name, None) is not None
    37	
    38	
    39	def test_remote_index_request_accepts_backend_attachment_stream_content_ref() -> None:
    40	    remote_index_request = _require_model("RemoteIndexRequest")
    41	
    42	    request = remote_index_request.model_validate(
    43	        {
    44	            "knowledge_base_id": 11,
    45	            "document_id": 22,
    46	            "index_owner_user_id": 33,
    47	            "retriever_config": {
    48	                "name": "retriever-a",
    49	                "namespace": "default",
    50	                "storage_config": {
    51	                    "type": "milvus",
    52	                    "url": "http://milvus:19530",
    53	                },
    54	            },
    55	            "embedding_model_config": {
    56	                "model_name": "text-embedding-3-large",
    57	                "model_namespace": "default",
    58	                "resolved_config": {"protocol": "openai"},
    59	            },
    60	            "index_families": ["chunk_vector"],
    61	            "content_ref": {
    62	                "kind": "backend_attachment_stream",
    63	                "url": "http://backend:8000/api/internal/rag/content/22",
    64	                "auth_token": "test-token",
    65	            },
    66	        }
    67	    )
    68	
    69	    assert request.content_ref.kind == "backend_attachment_stream"
    70	    assert request.content_ref.auth_token == "test-token"
    71	
    72	
    73	def test_remote_index_request_rejects_unknown_content_ref_kind() -> None:
    74	    remote_index_request = _require_model("RemoteIndexRequest")
    75	
    76	    with pytest.raises(ValidationError):
    77	        remote_index_request.model_validate(
    78	            {
    79	                "knowledge_base_id": 11,
    80	                "document_id": 22,
    81	                "index_owner_user_id": 33,
    82	                "retriever_config": {"provider": "milvus"},
    83	                "embedding_model_config": {"model": "text-embedding-3-large"},
    84	                "index_families": ["chunk_vector"],
    85	                "content_ref": {
    86	                    "kind": "unsupported_kind",
    87	                    "url": "http://backend:8000/api/internal/rag/content/22",
    88	                },
    89	            }
    90	        )
    91	
    92	
    93	def test_remote_query_response_preserves_index_family_per_record() -> None:
    94	    remote_query_response = _require_model("RemoteQueryResponse")
    95	
    96	    response = remote_query_response.model_validate(
    97	        {
    98	            "records": [
    99	                {
   100	                    "content": "Chunk A",
   101	                    "title": "Doc A",
   102	                    "score": 0.91,
   103	                    "knowledge_base_id": 1001,
   104	                    "index_family": "chunk_vector",
   105	                },
   106	                {
   107	                    "content": "Summary B",
   108	                    "title": "Doc B",
   109	                    "score": 0.88,
   110	                    "knowledge_base_id": 1002,
   111	                    "index_family": "summary_vector",
   112	                },
   113	            ],
   114	            "total": 2,
   115	        }
   116	    )
   117	
   118	    assert [record.index_family for record in response.records] == [
   119	        "chunk_vector",
   120	        "summary_vector",
   121	    ]
   122	
   123	
   124	def test_remote_query_request_accepts_explicit_execution_configs() -> None:
   125	    remote_query_request = _require_model("RemoteQueryRequest")
   126	
   127	    request = remote_query_request.model_validate(
   128	        {
   129	            "knowledge_base_ids": [1001],
   130	            "query": "release checklist",
   131	            "max_results": 6,
   132	            "knowledge_base_configs": [
   133	                {
   134	                    "knowledge_base_id": 1001,
   135	                    "index_owner_user_id": 42,
   136	                    "retriever_config": {
   137	                        "name": "retriever-a",
   138	                        "namespace": "default",
   139	                        "storage_config": {
   140	                            "type": "qdrant",
   141	                            "url": "http://qdrant:6333",
   142	                            "indexStrategy": {"mode": "per_dataset"},
   143	                        },
   144	                    },
   145	                    "embedding_model_config": {
   146	                        "model_name": "embed-a",
   147	                        "model_namespace": "default",
   148	                        "resolved_config": {
   149	                            "protocol": "openai",
   150	                            "model_id": "text-embedding-3-small",
   151	                            "base_url": "https://api.openai.com/v1",
   152	                        },
   153	                    },
   154	                    "retrieval_config": {
   155	                        "top_k": 8,
   156	                        "score_threshold": 0.55,
   157	                        "retrieval_mode": "hybrid",
   158	                        "vector_weight": 0.8,
   159	                        "keyword_weight": 0.2,
   160	                    },
   161	                }
   162	            ],
   163	            "enabled_index_families": ["chunk_vector", "summary_vector_index"],
   164	            "retrieval_policy": "summary_then_chunk_expand",
   165	        }
   166	    )
   167	
   168	    assert (
   169	        request.knowledge_base_configs[0].retriever_config.storage_config["type"]
   170	        == "qdrant"
   171	    )
   172	    assert request.knowledge_base_configs[0].retrieval_config.top_k == 8
   173	    assert request.enabled_index_families == [
   174	        "chunk_vector",
   175	        "summary_vector_index",
   176	    ]
   177	
   178	
   179	def test_remote_delete_request_requires_resolved_retriever_config() -> None:
   180	    remote_delete_request = _require_model("RemoteDeleteDocumentIndexRequest")
   181	
   182	    request = remote_delete_request.model_validate(
   183	        {
   184	            "knowledge_base_id": 101,
   185	            "document_ref": "202",
   186	            "index_owner_user_id": 303,
   187	            "retriever_config": {
   188	                "name": "retriever-a",
   189	                "namespace": "default",
   190	                "storage_config": {
   191	                    "type": "elasticsearch",
   192	                    "url": "http://es:9200",
   193	                    "indexStrategy": {"mode": "per_user", "prefix": "wegent"},
   194	                },
   195	            },
   196	            "enabled_index_families": ["chunk_vector"],
   197	        }
   198	    )
   199	
   200	    assert request.retriever_config.name == "retriever-a"
   201	    assert request.retriever_config.storage_config["type"] == "elasticsearch"

Mask transport-layer secrets in protocol models to prevent accidental exposure in logs and repr output.

BackendAttachmentStreamContentRef.auth_token (line 26) and KnowledgeRuntimeAuth.token (line 57) are plain str fields, so Pydantic's default serialization and __repr__ will expose live bearer/download tokens. Switch these to SecretStr or use field exclusion to prevent token leakage in request logs and debugging output. Verify that outbound HTTP calls still send the raw token value (via field access or custom serializer).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, The auth
token fields currently stored as plain
strings—BackendAttachmentStreamContentRef.auth_token and
KnowledgeRuntimeAuth.token—must be changed to Pydantic-safe secret types or
excluded from repr/serialization to avoid leaking credentials; update those
fields to use pydantic.SecretStr (or annotate with Field(repr=False,
exclude=True)) in the KnowledgeRuntimeProtocolModel subclasses and adjust any
code that sends tokens (HTTP clients, headers builders) to call
.get_secret_value() (or the equivalent accessor) so outbound requests still send
the raw token value. Ensure model serialization/parse logic preserves
compatibility (e.g., accept plain strings on input) and verify unit tests or
callers that access token value are updated to use the SecretStr accessor.

@kissghosts kissghosts force-pushed the feature/rag-service-extraction branch from 599fe6b to 4b1dcc6 Compare April 6, 2026 03:18
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

♻️ Duplicate comments (5)
docker/knowledge_runtime/Dockerfile (1)

5-29: ⚠️ Potential issue | 🟠 Major

Drop root before starting the service.

The container still runs as root, which increases breakout impact for a network-facing process. Create a dedicated user after the install steps and switch to it before CMD, making sure /app, /shared, and /knowledge_engine remain readable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/knowledge_runtime/Dockerfile` around lines 5 - 29, Create a non-root
runtime user after the install steps in the Dockerfile (after RUN uv pip install
...) and ensure /app, /shared, and /knowledge_engine are owned/readable by that
user; add a user/group (e.g., appuser), chown those directories to the new user,
and switch to it with the USER instruction before the existing CMD that runs
uvicorn (the CMD invoking "uvicorn knowledge_runtime.main:app ..."). Ensure
permissions allow the service to read/write as needed while the image no longer
runs as root.
knowledge_runtime/knowledge_runtime/core/config.py (1)

15-15: ⚠️ Potential issue | 🟠 Major

Make INTERNAL_SERVICE_TOKEN non-default.

A fixed fallback token weakens the internal auth boundary and is easy to miss in non-development deployments. Require the env var instead of silently accepting "knowledge-runtime-token".

Suggested fix
-    INTERNAL_SERVICE_TOKEN: str = "knowledge-runtime-token"
+    INTERNAL_SERVICE_TOKEN: str
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/core/config.py` at line 15, Replace the
hardcoded fallback string for the module-level constant INTERNAL_SERVICE_TOKEN
with a required environment lookup and fail fast if it's missing; specifically,
stop assigning "knowledge-runtime-token" and instead read
os.environ["INTERNAL_SERVICE_TOKEN"] (or use a helper that raises) and raise a
clear RuntimeError/ValueError during import if the env var is not set so the
service cannot start with a default token; update any initialization/tests that
assumed the default to provide the env var.
backend/app/services/rag/runtime_resolver.py (2)

425-447: ⚠️ Potential issue | 🔴 Critical

Keep non-default model lookups tenant-scoped too.

The model_namespace != "default" branch drops the user_id / shared-model filter and ordering, so the first active model with that namespace/name wins across tenants. That can attach another tenant's model config and decrypted credentials to the runtime payload.

Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 425 - 447, The
non-default branch in runtime_resolver.py currently queries Kind without scoping
by user_id; update the second db.query(Kind) block (the branch where
model_namespace != "default") to include the same tenant-scoping and ordering as
the default branch: filter on (Kind.user_id == user_id) | (Kind.user_id == 0)
and order_by(Kind.user_id.desc()) so the lookup uses the three identifying
fields (Kind.namespace, Kind.name, Kind.user_id) and picks tenant-owned over
shared before calling .first(); keep the existing filters for Kind.kind,
Kind.name, Kind.namespace, and Kind.is_active.

318-332: ⚠️ Potential issue | 🔴 Critical

Restore tenant-scoped knowledge-base lookup.

_get_knowledge_base_record() still resolves any active KnowledgeBase by id only, and the new query/public/delete builders all depend on it. That lets one tenant drive runtime-config resolution from another tenant's KB. Thread requester identity through this helper and use the tenant-aware lookup path instead.

Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 318 - 332, The
helper _get_knowledge_base_record currently queries by Kind.id only; change it
to be tenant-scoped by accepting the requester/tenant identity (e.g., add params
for namespace and user_id or a requester object) and update the query to filter
on the three unique Kind identifiers (Kind.namespace, Kind.name, Kind.user_id)
in addition to Kind.kind == "KnowledgeBase" and Kind.is_active, then update all
callers of _get_knowledge_base_record to thread the requester/tenant info so the
lookup uses the tenant-aware path rather than resolving by id alone.
shared/models/knowledge_runtime_protocol.py (1)

21-27: ⚠️ Potential issue | 🟠 Major

Mask transport tokens in these protocol models.

auth_token and token are plain str, so repr(), validation errors, or model_dump() can expose live bearer/download tokens. Prefer SecretStr here and update the header builders to read the raw secret explicitly.

🛠️ Possible fix
-from pydantic import BaseModel, ConfigDict, Field
+from pydantic import BaseModel, ConfigDict, Field, SecretStr
@@
-    auth_token: str
+    auth_token: SecretStr = Field(repr=False)
@@
-    token: str
+    token: SecretStr = Field(repr=False)

Call sites that build Authorization headers will then need .get_secret_value().

Also applies to: 53-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, Change
plain-string token fields to secure types: replace the auth_token: str in
BackendAttachmentStreamContentRef and any other models that define token: str
(see the other model around the 53-57 block) with SecretStr so secrets aren't
exposed via repr()/model_dump()/validation errors, and update any
header-building code that uses these models to call .get_secret_value() to
obtain the raw token (e.g., where Authorization or download headers are
constructed). Ensure imports for SecretStr are added and all call sites
retrieving the token are updated accordingly.
🧹 Nitpick comments (17)
backend/app/services/rag/retrieval/filters.py (1)

12-12: Consider sorting __all__ for consistency.

Static analysis flags that __all__ is not sorted. This is a minor style issue.

🔧 Suggested fix
-__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"]
+__all__ = ["build_elasticsearch_filters", "parse_metadata_filters"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/retrieval/filters.py` at line 12, The module-level
export list __all__ is unsorted; please reorder it alphabetically (e.g.,
["build_elasticsearch_filters", "parse_metadata_filters"]) so __all__ is
deterministic and satisfies static analysis; update the __all__ assignment near
the top of the module (where __all__ is defined) to the sorted list while
leaving the functions parse_metadata_filters and build_elasticsearch_filters
untouched.
backend/app/api/endpoints/adapter/retrievers.py (1)

244-244: Prefer underscore prefix over explicit del for unused parameters.

Using del current_user is unconventional. If the parameter is only needed for authentication but not used in the function body, consider prefixing with underscore or removing the check from type hints while keeping Depends.

🔧 Suggested alternative
 `@router.post`("/test-connection")
 async def test_retriever_connection(
     test_data: dict,
-    current_user: User = Depends(security.get_current_user),
+    _current_user: User = Depends(security.get_current_user),
 ):
     ...
     _check_rag_available()
-    del current_user
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/adapter/retrievers.py` at line 244, The explicit
"del current_user" line is unconventional; instead rename the unused dependency
parameter to start with an underscore (e.g., _current_user) in the function
signature that currently contains "del current_user" (keep the Depends(...) so
authentication still runs) and remove the "del current_user" statement;
alternatively, if you prefer to keep the name, leave the parameter but prefix it
with an underscore in that same function's signature to signal it's
intentionally unused (update any type hints accordingly).
knowledge_engine/tests/test_storage_factory.py (1)

39-46: Verify that type key exclusion is intentional.

The assertion expects the config to not include the "type" key (which was "fake" in input). This is correct if the factory strips backend type before passing to the constructor. However, if this is the expected behavior, consider adding an explicit comment or separate assertion to document this transformation.

💡 Suggested enhancement for clarity
     assert isinstance(backend, FakeBackend)
+    # Factory should strip 'type' key and normalize config before passing to backend
     assert captured["config"] == {
         "url": "http://vector-store:1234",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/tests/test_storage_factory.py` around lines 39 - 46, The
test currently asserts that captured["config"] lacks the "type" key but doesn't
document whether that omission is intentional; update the test in
test_storage_factory (the test that sets up captured and asserts
captured["config"]) to explicitly verify and document the transformation by
either adding an assertion that the original input contained "type": "fake" and
that the factory removed it, or by adding a concise comment explaining that
StorageFactory strips the backend "type" before passing config to the
constructor (reference the captured variable and the factory/create method used
in the test).
backend/app/services/rag/storage/elasticsearch_backend.py (1)

12-12: Sort __all__ alphabetically.

Per static analysis (RUF022), the exports should be sorted. This is a minor style consistency fix.

✏️ Suggested fix
-__all__ = ["ElasticsearchBackend", "Elasticsearch"]
+__all__ = ["Elasticsearch", "ElasticsearchBackend"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/storage/elasticsearch_backend.py` at line 12, The
__all__ export list is not alphabetized; update the module-level __all__
(currently ["ElasticsearchBackend", "Elasticsearch"]) to be sorted
alphabetically so the exports read ["Elasticsearch", "ElasticsearchBackend"] to
satisfy RUF022 and consistent style.
knowledge_engine/knowledge_engine/embedding/factory.py (1)

30-80: Consider splitting _create_embedding_model_from_resolved_values into smaller helpers.

This function is just over the preferred limit and now mixes protocol routing plus object construction. Extracting protocol-specific builders will keep it easier to evolve safely.

As per coding guidelines, "Function length: Max 50 lines per function (preferred)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/embedding/factory.py` around lines 30 - 80,
The function _create_embedding_model_from_resolved_values is doing protocol
routing and object construction; refactor by extracting protocol-specific
builders: implement a helper _build_openai_embedding(protocol, model_id,
api_key, base_url, custom_headers, dimensions) that returns either
CustomEmbedding or OpenAIEmbedding for protocol "openai", and another helper
_build_custom_protocol_embedding(protocol, model_name, model_id, api_key,
base_url, custom_headers, dimensions) that validates base_url and returns
CustomEmbedding for protocols in {"cohere","jina","custom"}; update
_create_embedding_model_from_resolved_values to call those helpers for the
corresponding protocol branches and keep the final unsupported-protocol
ValueError in place.
knowledge_engine/knowledge_engine/retrieval/filters.py (1)

20-98: Refactor parse_metadata_filters into smaller helpers.

parse_metadata_filters currently exceeds the preferred function-size guideline. Splitting operator-map/constants and condition parsing will improve maintainability.

As per coding guidelines, "Function length: Max 50 lines per function (preferred)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/retrieval/filters.py` around lines 20 - 98,
Split parse_metadata_filters into smaller helpers: move the OPERATOR_MAP
constant out of parse_metadata_filters to a module-level constant (e.g.,
OPERATOR_MAP) and create a helper function to build the filters list (e.g.,
_build_filters_from_conditions(knowledge_id, metadata_condition)) that contains
the loop creating ExactMatchFilter, MetadataFilter and mapping operators, plus a
small helper (e.g., _get_condition_type(metadata_condition)) to determine the
"and"/"or" string; then simplify parse_metadata_filters to call these helpers
and return MetadataFilters(filters=..., condition=...), keeping the existing
semantics and using ExactMatchFilter, MetadataFilter and MetadataFilters as
before.
backend/tests/services/rag/test_runtime_specs.py (1)

143-154: Single-case parameterized test could be simplified.

The parameterized test now has only one test case. Consider converting it to a regular test function for clarity, or document if more cases are planned.

-@pytest.mark.parametrize(
-    ("kwargs", "expected_message"),
-    [
-        (
-            {"source_type": "attachment"},
-            "Field required",
-        ),
-    ],
-)
-def test_index_source_enforces_coherent_shape(kwargs, expected_message):
-    with pytest.raises(ValidationError, match=expected_message):
-        IndexSource(**kwargs)
+def test_index_source_requires_attachment_id():
+    with pytest.raises(ValidationError, match="Field required"):
+        IndexSource(source_type="attachment")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/services/rag/test_runtime_specs.py` around lines 143 - 154, The
test test_index_source_enforces_coherent_shape currently uses
pytest.mark.parametrize with a single case; simplify by removing the
`@pytest.mark.parametrize` decorator and directly call
IndexSource(**{"source_type": "attachment"}) inside the with
pytest.raises(ValidationError, match="Field required") block (or alternatively
expand to more cases if you intend to add them), keeping the test name and the
assertion on IndexSource the same.
knowledge_engine/knowledge_engine/storage/factory.py (1)

28-76: Consider extracting duplicated validation and config-building logic.

Both create_storage_backend_from_config and create_storage_backend_from_runtime_config duplicate the validation check and config dictionary construction. Extract shared logic to reduce duplication.

♻️ Suggested refactor
+def _validate_storage_type(storage_type: str) -> str:
+    normalized = storage_type.lower()
+    if normalized not in STORAGE_BACKEND_REGISTRY:
+        raise ValueError(
+            f"Unsupported storage type: {normalized or '<missing>'}. "
+            f"Supported types: {list(STORAGE_BACKEND_REGISTRY.keys())}"
+        )
+    return normalized
+
+
+def _build_backend_config(
+    url: Optional[str],
+    username: Optional[str],
+    password: Optional[str],
+    api_key: Optional[str],
+    index_strategy: Optional[Dict[str, Any]],
+    ext: Optional[Dict[str, Any]],
+) -> Dict[str, Any]:
+    return {
+        "url": url,
+        "username": username,
+        "password": password,
+        "apiKey": api_key,
+        "indexStrategy": index_strategy or {"mode": "per_dataset"},
+        "ext": ext or {},
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/factory.py` around lines 28 - 76,
Both functions create_storage_backend_from_config and
create_storage_backend_from_runtime_config duplicate the same validation against
STORAGE_BACKEND_REGISTRY and construction of the config dict; refactor by
extracting a small helper (e.g., _validate_and_build_storage_config or
_get_storage_backend_class_and_config) that accepts the raw inputs (storage_type
and a mapping or explicit params) and returns the validated backend_class and
the normalized config dict (with keys
"url","username","password","apiKey","indexStrategy","ext"); then replace the
bodies of create_storage_backend_from_config and
create_storage_backend_from_runtime_config to call that helper and simply return
backend_class(config). Ensure the helper handles lowercasing the type, raises
the same ValueError message when missing/unsupported, and applies defaults for
indexStrategy and ext.
backend/app/api/endpoints/internal/rag_content.py (1)

79-85: Loading full attachment content into memory may cause OOM for large files.

Using Response(content=binary_data) loads the entire attachment into memory. For large files, consider using StreamingResponse with chunked reading to avoid memory exhaustion.

♻️ Suggested improvement using StreamingResponse
+from fastapi.responses import StreamingResponse
+import io

-    return Response(
-        content=binary_data,
-        media_type=context.mime_type,
-        headers={
-            "Content-Disposition": _build_content_disposition(context.original_filename)
-        },
-    )
+    return StreamingResponse(
+        io.BytesIO(binary_data),
+        media_type=context.mime_type,
+        headers={
+            "Content-Disposition": _build_content_disposition(context.original_filename)
+        },
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag_content.py` around lines 79 - 85, The
current return uses Response(content=binary_data) which loads the entire file
into memory; change this to use StreamingResponse (from starlette.responses) and
stream the attachment in chunks instead of passing binary_data directly.
Implement a generator/iterable that reads the file-like source in chunks (or
wraps bytes in io.BytesIO and yields chunks) and pass that generator to
StreamingResponse with media_type=context.mime_type and
headers={"Content-Disposition":
_build_content_disposition(context.original_filename)}; update the return in the
same place where Response, binary_data, context.mime_type, and
_build_content_disposition are referenced.
knowledge_runtime/knowledge_runtime/api/internal/rag.py (1)

20-49: Consider adding response models for consistency.

The query endpoint uses response_model=RemoteQueryResponse, but other endpoints return generic dict. Adding response models improves API documentation and type safety.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py` around lines 20 -
49, The endpoints index_document, delete_document_index, and test_connection
return generic dicts while query uses response_model=RemoteQueryResponse; update
each route decorator to include the appropriate response_model (e.g.,
response_model=RemoteIndexResponse for index_document,
response_model=RemoteDeleteDocumentIndexResponse for delete_document_index, and
response_model=RemoteTestConnectionResponse for test_connection) and change the
function return type annotations to those concrete response model classes;
ensure you import the model classes and keep the handlers calls
(runtime_handlers.index_document, runtime_handlers.delete_document_index,
runtime_handlers.test_connection) unchanged so they return/serialize to the
specified models for consistent docs and typing.
backend/app/api/endpoints/rag.py (1)

107-108: Consider logging the remote gateway error before falling back.

The fallback to LocalRagGateway silently swallows the RemoteRagGatewayError. Adding a log statement helps with debugging and monitoring fallback frequency.

♻️ Suggested improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
         try:
             result = await gateway.query(runtime_spec, db=db)
         except RemoteRagGatewayError:
+            logger.warning("Remote RAG gateway failed, falling back to local gateway")
             result = await LocalRagGateway().query(runtime_spec, db=db)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/rag.py` around lines 107 - 108, When catching
RemoteRagGatewayError in the endpoint, log the caught exception before falling
back to LocalRagGateway: update the except RemoteRagGatewayError block to record
the error (e.g., logger.error or logger.exception) including the exception
details and context (runtime_spec or other relevant identifiers) and then call
await LocalRagGateway().query(runtime_spec, db=db); reference
RemoteRagGatewayError, LocalRagGateway, query, runtime_spec and db to locate the
change.
backend/app/services/rag/runtime_specs.py (1)

5-17: Reuse the shared RetrievalPolicy alias here.

This redeclares the same literal union that already exists in shared.models. Keeping two copies makes local/remote mode drift likely the next time a policy is added or renamed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_specs.py` around lines 5 - 17, The local
redeclaration of RetrievalPolicy should be removed and the alias imported from
shared.models to avoid drift; replace the Literal[...] block with an import of
RetrievalPolicy from shared.models (same module that defines
RemoteKnowledgeBaseQueryConfig, RuntimeEmbeddingModelConfig,
RuntimeRetrievalConfig, RuntimeRetrieverConfig) and update any local references
to use that imported RetrievalPolicy symbol.
knowledge_runtime/knowledge_runtime/services/handlers.py (1)

28-110: Instrument the new remote handler entrypoints.

index_document(), query(), delete_document_index(), and test_connection() are the main runtime boundaries, but none of them open a top-level span or attach request identifiers. That will make remote-mode failures much harder to correlate.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 28 -
110, The four remote handler methods (index_document, query,
delete_document_index, test_connection) lack OpenTelemetry tracing; wrap each
async method with the `@trace_async` decorator (using an appropriate span_name and
tracer_name) and provide an extract_attributes callable that pulls request
identifiers and key context (e.g., request.request_id or request.id,
request.knowledge_base_id/knowledge_base_configs items,
request.index_owner_user_id, document_id/document_ref, query text or
retrieval_config identifiers) so every top-level entrypoint opens a span and
attaches those attributes for correlation; ensure the decorator is
imported/available and apply it to each method signature.
knowledge_runtime/tests/test_internal_rag.py (1)

218-227: Add a success-path /internal/rag/query test.

The only query coverage here is the 401 branch. The new aggregation path still has no regression coverage for record merging, sorting, truncation, or document_id / index_family shaping.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/tests/test_internal_rag.py` around lines 218 - 227, Add a
new success-path test in knowledge_runtime/tests/test_internal_rag.py that uses
TestClient(create_app()) to POST to "/internal/rag/query" with a valid
Authorization header and a payload similar to the existing test; seed or mock
the underlying retrieval/aggregation layer (or use test fixtures) so the
aggregation path runs and assert response.status_code == 200 and response.json()
contains the expected merged records; specifically verify that records are
merged correctly across sources, sorted by relevance, truncated to any
configured max_results, and that each returned item includes the shaped fields
"document_id" and "index_family" (use the same helper/mocking utilities as other
tests to locate functions interacting with the retrieval/aggregation pipeline
and validate merging, sorting, and truncation behavior).
knowledge_engine/knowledge_engine/storage/milvus_backend.py (2)

59-61: Use immutable type annotation to avoid mutable class attribute default.

The class attribute _milvusclient_config has a mutable default {}. While it's reassigned in __init__, this pattern can cause subtle bugs if subclasses don't call super().__init__(). Declare it as a type annotation only.

♻️ Proposed fix
-    # Store config for lazy async client creation
-    _milvusclient_config: Dict[str, Any] = {}
+    # Store config for lazy async client creation (initialized in __init__)
+    _milvusclient_config: Dict[str, Any]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py` around lines 59
- 61, The class attribute _milvusclient_config is defined with a mutable default
{} which can lead to shared-state bugs; change it to a type-only annotation
(e.g., _milvusclient_config: Dict[str, Any]) without assigning {} in the class
body and ensure the instance-specific dict is created in __init__ (or where the
lazy async client configuration is set) so subclasses that omit super().__init__
won't accidentally share state.

720-726: Consider logging client close failures for debugging.

The try-except-pass pattern when closing the client silently swallows any errors. While unlikely, connection close failures can indicate resource issues. A debug log would help with troubleshooting.

♻️ Proposed fix
         finally:
             # Ensure client is closed to avoid connection leaks
             if client:
                 try:
                     client.close()
-                except Exception:
-                    pass
+                except Exception as e:
+                    logger.debug(f"[Milvus] Error closing client: {e}")

The same pattern appears in test_connection (lines 743-749) and get_all_chunks (lines 823-829).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py` around lines 720
- 726, The current client.close() calls in the finally blocks of the methods use
a try-except-pass pattern that silently ignores exceptions. To improve
debugging, update these blocks in the methods containing client.close()
(including the finally blocks in the current method, test_connection, and
get_all_chunks) to catch exceptions and log the failure using a debug or error
log instead of passing silently. This will help surface connection close
failures without interrupting normal operation.
knowledge_engine/knowledge_engine/storage/qdrant_backend.py (1)

563-570: Move logging import to module level.

The logging module is imported inside the exception handler, which is unconventional. Import it at the module level alongside other imports and create a module-level logger for consistency with other backend implementations.

♻️ Proposed fix

Add at the top of the file (after line 19):

import logging

Add after imports (around line 35):

logger = logging.getLogger(__name__)

Then update the exception handler:

         except Exception as e:
             # Log error but return empty list to allow fallback to RAG
-            import logging
-
-            logger = logging.getLogger(__name__)
             logger.warning(
                 f"[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}"
             )
             return []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/qdrant_backend.py` around lines 563
- 570, The logging import is done inside an exception handler; move "import
logging" to the module level and create a module-level logger (e.g., logger =
logging.getLogger(__name__)) near the other imports so it's consistent with
other backends, then update the except block that currently references a
locally-imported logger to use the module-level logger (the block that logs
"[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/endpoints/rag.py`:
- Around line 113-116: The except blocks in rag.py currently re-raise
HTTPException without chaining the original exceptions, losing the original
tracebacks; update both exception handlers in the function containing these
blocks to use exception chaining (e.g., raise HTTPException(status_code=400,
detail=str(e)) from e and raise HTTPException(status_code=500, detail=str(e))
from e) so the original traceback is preserved for debugging—locate the
ValueError and generic Exception handlers (the except ValueError as e and except
Exception as e blocks) and add "from e" to each raised HTTPException.

In `@backend/app/core/config.py`:
- Around line 238-267: Update the parse_rag_runtime_mode validator to enforce
only "local" or "remote" values: when v is None return "local"; when v is a
Mapping parse keys/values to lowercased strings but validate each resulting
value is exactly "local" or "remote" and raise ValueError for any invalid entry;
when v is a string trim and, if JSON, decode to a Mapping and validate as above,
otherwise ensure the final lowercased string equals "local" or "remote" and
raise ValueError if not; apply the same strict validation logic to the other
validator referenced around lines 543-550 so both parse/field_validator paths
only accept "local" or "remote" and fail-fast on misconfiguration.

In `@backend/app/services/rag/local_data_plane/administration.py`:
- Around line 15-26: The new async executor test_connection_local lacks
OpenTelemetry tracing; add the `@trace_async` decorator to the async function (use
trace_async with a clear span name like "administration.test_connection_local"
and the appropriate tracer name, e.g., "backend") and supply an
extract_attributes callable that records useful inputs (e.g.,
spec.retriever_config or other non-sensitive attributes) so the whole async
boundary is traced; update the function definition for test_connection_local to
be decorated with trace_async accordingly.

In `@backend/app/services/rag/local_data_plane/indexing.py`:
- Around line 164-180: The helper functions _build_index_storage_backend and
_build_index_embed_model are missing return type annotations; update their
signatures to include appropriate return types (e.g., the concrete storage
backend type returned by
create_storage_backend/create_storage_backend_from_runtime_config and the embed
model type returned by whatever create_embed_model/_embed_model_factory returns)
so static typing is satisfied; locate the functions named
_build_index_storage_backend and _build_index_embed_model in indexing.py and add
explicit return type hints matching the actual classes/interfaces they return
(or use a suitable Protocol/Union/typing.Any if multiple types are possible),
and update any imports if necessary to reference those types.

In `@backend/app/services/rag/retrieval/__init__.py`:
- Line 12: The __all__ export list in the module is unsorted which triggers Ruff
RUF022; update the __all__ declaration to list the exported names in sorted
order, e.g., ensure "__all__ = ['build_elasticsearch_filters',
'parse_metadata_filters']" so the symbol names (build_elasticsearch_filters and
parse_metadata_filters) are alphabetized.

In `@backend/tests/api/endpoints/test_rag_api.py`:
- Around line 66-76: The test uses patch(..., create=True) when mocking
build_public_query_runtime_spec which allows silent creation of the attribute;
remove the create=True argument from the patch call that targets
build_public_query_runtime_spec so the patch validates the real import path
(i.e., update the patch invocation for
"app.api.endpoints.rag.runtime_resolver.build_public_query_runtime_spec" to omit
create=True), keeping the rest of the mock setup (return_value=runtime_spec and
the surrounding with(...) context) unchanged.

In `@backend/tests/api/endpoints/test_retrievers_api.py`:
- Around line 71-75: The test asserts an inconsistent validation message: the
request includes storage_type but the expected response lists both storage_type
and url as missing. Update the assertion in
backend/tests/api/endpoints/test_retrievers_api.py to match the actual request
payload—either change the expected message to "Missing required fields: url"
when storage_type is provided, or modify the request to omit storage_type if you
intend to assert both are missing; update the response.json() expected dict
accordingly (reference the response variable and the failing assertion block).

In `@docker-compose.yml`:
- Around line 79-84: Remove the predictable fallback for INTERNAL_SERVICE_TOKEN
and stop exposing the service port to the host by default: require the env var
INTERNAL_SERVICE_TOKEN to be set (do not provide a default value) and remove or
comment out the host port binding "${KNOWLEDGE_RUNTIME_PORT:-8200}:8200" so the
container only listens on its internal port (keep PORT: 8200 and HOST: 0.0.0.0
if needed for internal network access), and ensure similar changes are applied
to the other occurrence of the same mapping and env (lines referenced 133-134).
Update README or compose docs to instruct operators to explicitly set
INTERNAL_SERVICE_TOKEN and host port mapping when they intentionally want host
exposure.

In `@docker/knowledge_runtime/Dockerfile`:
- Around line 13-18: The Dockerfile currently copies pyproject.toml and runs "uv
pip install --system --no-cache -r pyproject.toml", which bypasses UV's project
management; instead copy the full project (COPY knowledge_runtime /app) and
install via UV from that directory (e.g., RUN uv pip install /app or the UV "uv
install" workflow) so lockfiles and reproducible installs are respected—update
the RUN line that references "uv pip install --system --no-cache -r
pyproject.toml" to the idiomatic "uv pip install /app" (or "uv install") and
ensure pyproject.toml remains inside /app before running the command.

In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 72-73: The current code assumes
response.json()["data"][0]["embedding"] always exists; add defensive validation
after response.raise_for_status(): parse body = response.json(), verify body is
a dict, that "data" is present and is a non-empty list, and that body["data"][0]
contains "embedding"; if any check fails, raise a clear ValueError (or custom
exception) including a descriptive message and the raw response text/JSON to aid
debugging instead of letting a KeyError/IndexError surface; ensure the code
still assigns embedding when validations pass (the lines around
response.raise_for_status(), response.json(), and the embedding variable are the
places to update).
- Around line 61-77: The retry decorator on _call_api is currently
unconditional; change it to only retry transient network/HTTP exceptions by
adding a retry= parameter (e.g.,
retry=retry_if_exception_type((requests.exceptions.ConnectionError,
requests.exceptions.Timeout, requests.exceptions.HTTPError))) so
JSONDecodeError/KeyError are not retried; update the `@retry`(...) invocation that
decorates _call_api to include that retry predicate and import the required
tenacity helper(s) and requests exception types.
- Around line 58-59: The class defines async _aget_query_embedding but lacks the
required async _aget_text_embedding needed by BaseEmbedding; implement an async
def _aget_text_embedding(self, text: str) -> list[float] that delegates to the
synchronous _get_text_embedding via asyncio.to_thread (mirroring
_aget_query_embedding) so async callers (e.g., aget_text_embedding_batch) work
correctly and avoid runtime errors.

In `@knowledge_engine/knowledge_engine/embedding/factory.py`:
- Around line 40-62: The OpenAI branch in factory.py currently hardcodes
"text-embedding-3-small" when model_id is falsy; change the fallback to prefer
runtime_config.model_name before the hardcoded default. Update both the
CustomEmbedding and OpenAIEmbedding invocations (the places constructing
model=... / model=... or model=... param) to use model_id or
runtime_config.model_name or "text-embedding-3-small". Ensure runtime_config is
referenced/imported where needed and use the same fallback logic in the
CustomEmbedding and OpenAIEmbedding construction sites.

In `@knowledge_engine/knowledge_engine/query/executor.py`:
- Around line 53-57: The retrieval_setting values are taken directly from config
using config.get(..., default) which doesn’t replace keys that exist with None;
update the logic that builds retrieval_setting (the retrieval_setting dict in
executor.py) to normalize None entries first by treating None the same as
missing: for each key ("top_k", "score_threshold", "retrieval_mode") read
config.get(key) and if the returned value is None substitute the intended
default (20, 0.7, "vector") before assigning into retrieval_setting so
downstream code never receives None for these fields.

In `@knowledge_engine/knowledge_engine/retrieval/filters.py`:
- Around line 118-142: The operator variable is compared against lowercase
literal branches but isn't normalized, so mixed-case operators like "GTE" are
ignored; update the loop that processes metadata_condition["conditions"] to
normalize operator to lowercase (e.g., replace the current operator =
cond.get("operator", "eq") with obtaining it and calling .lower() safely) before
the series of if/elif checks (the block that appends filters for "eq", "ne",
"in", "nin", "gt", "gte", "lt", "lte", "contains", "text_match"); ensure you
guard against None when calling .lower() so existing checks for key and value
remain correct.

In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py`:
- Around line 468-498: list_documents() currently calls Elasticsearch.search
unconditionally which raises when the index doesn't exist; update list_documents
(use index_name and es_client created via Elasticsearch(self.url,
**self.es_kwargs)) to first check for the index's existence (eg.
es_client.indices.exists(index=index_name)) or catch the Elasticsearch
NotFoundError around the es_client.search(...) call and return an empty
paginated result when the index is missing so the function returns an empty page
instead of raising.

In `@knowledge_engine/knowledge_engine/storage/factory.py`:
- Around line 56-58: The function create_storage_backend_from_runtime_config
lacks a return type hint; update its signature to include the concrete storage
interface (for example -> StorageBackend) and import that type from the module
that defines the storage backend interface (e.g., the StorageBackend or
BaseStorage class in your storage backends module). Ensure the annotation
matches the actual object returned by create_storage_backend_from_runtime_config
(or use a Union of concrete backend classes if it can return multiple types).

In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 148-156: The RemoteQueryRecord construction currently hard-codes
index_family="chunk_vector", which mislabels non-chunk results; update where
RemoteQueryRecord is created (the RemoteQueryRecord(...) call) to derive
index_family from the engine result/record (e.g., a field on record that denotes
the family) instead of using the literal "chunk_vector", and if the
engine/record does not supply a supported index family, raise or reject the
request (or explicitly handle mapping from engine types to allowed values) so
enabled_index_families / retrieval_policy remain consistent with results.
- Around line 176-180: The code is using the URL basename
(parsed_url/PurePosixPath(file_name)) as a source filename which yields
meaningless attachment IDs for backend_attachment_stream URLs; change the branch
in the function that handles request.content_ref.url so that after extracting
file_name you detect and reject non-meaningful names (e.g., numeric IDs or paths
matching "/content/{attachment_id}" or where file_name has no extension) and
instead use the provided fallback/source name (e.g., request.content_ref.name or
the function's fallback parameter) as the source_file; still compute suffix from
a real filename when present, otherwise set suffix to an empty string or infer
from fallback, and return the chosen source_file and suffix (update any
references to parsed_url, file_name, PurePosixPath to reflect this logic).

In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 142-150: The model currently defaults knowledge_base_configs to an
empty list (Field(default_factory=list)), allowing malformed remote queries to
validate; change this so requests fail fast by making knowledge_base_configs
required (remove the default) or keep the field but add validation: implement a
Pydantic validator (e.g., `@root_validator` or
`@validator`('knowledge_base_configs')) in the same model to ensure
knowledge_base_configs is non-empty and that its entries
(RemoteKnowledgeBaseQueryConfig) correspond to knowledge_base_ids (lengths match
and each config's id or identifier matches an entry in knowledge_base_ids),
raising a ValueError if the check fails.

---

Duplicate comments:
In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 425-447: The non-default branch in runtime_resolver.py currently
queries Kind without scoping by user_id; update the second db.query(Kind) block
(the branch where model_namespace != "default") to include the same
tenant-scoping and ordering as the default branch: filter on (Kind.user_id ==
user_id) | (Kind.user_id == 0) and order_by(Kind.user_id.desc()) so the lookup
uses the three identifying fields (Kind.namespace, Kind.name, Kind.user_id) and
picks tenant-owned over shared before calling .first(); keep the existing
filters for Kind.kind, Kind.name, Kind.namespace, and Kind.is_active.
- Around line 318-332: The helper _get_knowledge_base_record currently queries
by Kind.id only; change it to be tenant-scoped by accepting the requester/tenant
identity (e.g., add params for namespace and user_id or a requester object) and
update the query to filter on the three unique Kind identifiers (Kind.namespace,
Kind.name, Kind.user_id) in addition to Kind.kind == "KnowledgeBase" and
Kind.is_active, then update all callers of _get_knowledge_base_record to thread
the requester/tenant info so the lookup uses the tenant-aware path rather than
resolving by id alone.

In `@docker/knowledge_runtime/Dockerfile`:
- Around line 5-29: Create a non-root runtime user after the install steps in
the Dockerfile (after RUN uv pip install ...) and ensure /app, /shared, and
/knowledge_engine are owned/readable by that user; add a user/group (e.g.,
appuser), chown those directories to the new user, and switch to it with the
USER instruction before the existing CMD that runs uvicorn (the CMD invoking
"uvicorn knowledge_runtime.main:app ..."). Ensure permissions allow the service
to read/write as needed while the image no longer runs as root.

In `@knowledge_runtime/knowledge_runtime/core/config.py`:
- Line 15: Replace the hardcoded fallback string for the module-level constant
INTERNAL_SERVICE_TOKEN with a required environment lookup and fail fast if it's
missing; specifically, stop assigning "knowledge-runtime-token" and instead read
os.environ["INTERNAL_SERVICE_TOKEN"] (or use a helper that raises) and raise a
clear RuntimeError/ValueError during import if the env var is not set so the
service cannot start with a default token; update any initialization/tests that
assumed the default to provide the env var.

In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 21-27: Change plain-string token fields to secure types: replace
the auth_token: str in BackendAttachmentStreamContentRef and any other models
that define token: str (see the other model around the 53-57 block) with
SecretStr so secrets aren't exposed via repr()/model_dump()/validation errors,
and update any header-building code that uses these models to call
.get_secret_value() to obtain the raw token (e.g., where Authorization or
download headers are constructed). Ensure imports for SecretStr are added and
all call sites retrieving the token are updated accordingly.

---

Nitpick comments:
In `@backend/app/api/endpoints/adapter/retrievers.py`:
- Line 244: The explicit "del current_user" line is unconventional; instead
rename the unused dependency parameter to start with an underscore (e.g.,
_current_user) in the function signature that currently contains "del
current_user" (keep the Depends(...) so authentication still runs) and remove
the "del current_user" statement; alternatively, if you prefer to keep the name,
leave the parameter but prefix it with an underscore in that same function's
signature to signal it's intentionally unused (update any type hints
accordingly).

In `@backend/app/api/endpoints/internal/rag_content.py`:
- Around line 79-85: The current return uses Response(content=binary_data) which
loads the entire file into memory; change this to use StreamingResponse (from
starlette.responses) and stream the attachment in chunks instead of passing
binary_data directly. Implement a generator/iterable that reads the file-like
source in chunks (or wraps bytes in io.BytesIO and yields chunks) and pass that
generator to StreamingResponse with media_type=context.mime_type and
headers={"Content-Disposition":
_build_content_disposition(context.original_filename)}; update the return in the
same place where Response, binary_data, context.mime_type, and
_build_content_disposition are referenced.

In `@backend/app/api/endpoints/rag.py`:
- Around line 107-108: When catching RemoteRagGatewayError in the endpoint, log
the caught exception before falling back to LocalRagGateway: update the except
RemoteRagGatewayError block to record the error (e.g., logger.error or
logger.exception) including the exception details and context (runtime_spec or
other relevant identifiers) and then call await
LocalRagGateway().query(runtime_spec, db=db); reference RemoteRagGatewayError,
LocalRagGateway, query, runtime_spec and db to locate the change.

In `@backend/app/services/rag/retrieval/filters.py`:
- Line 12: The module-level export list __all__ is unsorted; please reorder it
alphabetically (e.g., ["build_elasticsearch_filters", "parse_metadata_filters"])
so __all__ is deterministic and satisfies static analysis; update the __all__
assignment near the top of the module (where __all__ is defined) to the sorted
list while leaving the functions parse_metadata_filters and
build_elasticsearch_filters untouched.

In `@backend/app/services/rag/runtime_specs.py`:
- Around line 5-17: The local redeclaration of RetrievalPolicy should be removed
and the alias imported from shared.models to avoid drift; replace the
Literal[...] block with an import of RetrievalPolicy from shared.models (same
module that defines RemoteKnowledgeBaseQueryConfig, RuntimeEmbeddingModelConfig,
RuntimeRetrievalConfig, RuntimeRetrieverConfig) and update any local references
to use that imported RetrievalPolicy symbol.

In `@backend/app/services/rag/storage/elasticsearch_backend.py`:
- Line 12: The __all__ export list is not alphabetized; update the module-level
__all__ (currently ["ElasticsearchBackend", "Elasticsearch"]) to be sorted
alphabetically so the exports read ["Elasticsearch", "ElasticsearchBackend"] to
satisfy RUF022 and consistent style.

In `@backend/tests/services/rag/test_runtime_specs.py`:
- Around line 143-154: The test test_index_source_enforces_coherent_shape
currently uses pytest.mark.parametrize with a single case; simplify by removing
the `@pytest.mark.parametrize` decorator and directly call
IndexSource(**{"source_type": "attachment"}) inside the with
pytest.raises(ValidationError, match="Field required") block (or alternatively
expand to more cases if you intend to add them), keeping the test name and the
assertion on IndexSource the same.

In `@knowledge_engine/knowledge_engine/embedding/factory.py`:
- Around line 30-80: The function _create_embedding_model_from_resolved_values
is doing protocol routing and object construction; refactor by extracting
protocol-specific builders: implement a helper _build_openai_embedding(protocol,
model_id, api_key, base_url, custom_headers, dimensions) that returns either
CustomEmbedding or OpenAIEmbedding for protocol "openai", and another helper
_build_custom_protocol_embedding(protocol, model_name, model_id, api_key,
base_url, custom_headers, dimensions) that validates base_url and returns
CustomEmbedding for protocols in {"cohere","jina","custom"}; update
_create_embedding_model_from_resolved_values to call those helpers for the
corresponding protocol branches and keep the final unsupported-protocol
ValueError in place.

In `@knowledge_engine/knowledge_engine/retrieval/filters.py`:
- Around line 20-98: Split parse_metadata_filters into smaller helpers: move the
OPERATOR_MAP constant out of parse_metadata_filters to a module-level constant
(e.g., OPERATOR_MAP) and create a helper function to build the filters list
(e.g., _build_filters_from_conditions(knowledge_id, metadata_condition)) that
contains the loop creating ExactMatchFilter, MetadataFilter and mapping
operators, plus a small helper (e.g., _get_condition_type(metadata_condition))
to determine the "and"/"or" string; then simplify parse_metadata_filters to call
these helpers and return MetadataFilters(filters=..., condition=...), keeping
the existing semantics and using ExactMatchFilter, MetadataFilter and
MetadataFilters as before.

In `@knowledge_engine/knowledge_engine/storage/factory.py`:
- Around line 28-76: Both functions create_storage_backend_from_config and
create_storage_backend_from_runtime_config duplicate the same validation against
STORAGE_BACKEND_REGISTRY and construction of the config dict; refactor by
extracting a small helper (e.g., _validate_and_build_storage_config or
_get_storage_backend_class_and_config) that accepts the raw inputs (storage_type
and a mapping or explicit params) and returns the validated backend_class and
the normalized config dict (with keys
"url","username","password","apiKey","indexStrategy","ext"); then replace the
bodies of create_storage_backend_from_config and
create_storage_backend_from_runtime_config to call that helper and simply return
backend_class(config). Ensure the helper handles lowercasing the type, raises
the same ValueError message when missing/unsupported, and applies defaults for
indexStrategy and ext.

In `@knowledge_engine/knowledge_engine/storage/milvus_backend.py`:
- Around line 59-61: The class attribute _milvusclient_config is defined with a
mutable default {} which can lead to shared-state bugs; change it to a type-only
annotation (e.g., _milvusclient_config: Dict[str, Any]) without assigning {} in
the class body and ensure the instance-specific dict is created in __init__ (or
where the lazy async client configuration is set) so subclasses that omit
super().__init__ won't accidentally share state.
- Around line 720-726: The current client.close() calls in the finally blocks of
the methods use a try-except-pass pattern that silently ignores exceptions. To
improve debugging, update these blocks in the methods containing client.close()
(including the finally blocks in the current method, test_connection, and
get_all_chunks) to catch exceptions and log the failure using a debug or error
log instead of passing silently. This will help surface connection close
failures without interrupting normal operation.

In `@knowledge_engine/knowledge_engine/storage/qdrant_backend.py`:
- Around line 563-570: The logging import is done inside an exception handler;
move "import logging" to the module level and create a module-level logger
(e.g., logger = logging.getLogger(__name__)) near the other imports so it's
consistent with other backends, then update the except block that currently
references a locally-imported logger to use the module-level logger (the block
that logs "[Qdrant] Failed to get all chunks for KB {knowledge_id}: {e}").

In `@knowledge_engine/tests/test_storage_factory.py`:
- Around line 39-46: The test currently asserts that captured["config"] lacks
the "type" key but doesn't document whether that omission is intentional; update
the test in test_storage_factory (the test that sets up captured and asserts
captured["config"]) to explicitly verify and document the transformation by
either adding an assertion that the original input contained "type": "fake" and
that the factory removed it, or by adding a concise comment explaining that
StorageFactory strips the backend "type" before passing config to the
constructor (reference the captured variable and the factory/create method used
in the test).

In `@knowledge_runtime/knowledge_runtime/api/internal/rag.py`:
- Around line 20-49: The endpoints index_document, delete_document_index, and
test_connection return generic dicts while query uses
response_model=RemoteQueryResponse; update each route decorator to include the
appropriate response_model (e.g., response_model=RemoteIndexResponse for
index_document, response_model=RemoteDeleteDocumentIndexResponse for
delete_document_index, and response_model=RemoteTestConnectionResponse for
test_connection) and change the function return type annotations to those
concrete response model classes; ensure you import the model classes and keep
the handlers calls (runtime_handlers.index_document,
runtime_handlers.delete_document_index, runtime_handlers.test_connection)
unchanged so they return/serialize to the specified models for consistent docs
and typing.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 28-110: The four remote handler methods (index_document, query,
delete_document_index, test_connection) lack OpenTelemetry tracing; wrap each
async method with the `@trace_async` decorator (using an appropriate span_name and
tracer_name) and provide an extract_attributes callable that pulls request
identifiers and key context (e.g., request.request_id or request.id,
request.knowledge_base_id/knowledge_base_configs items,
request.index_owner_user_id, document_id/document_ref, query text or
retrieval_config identifiers) so every top-level entrypoint opens a span and
attaches those attributes for correlation; ensure the decorator is
imported/available and apply it to each method signature.

In `@knowledge_runtime/tests/test_internal_rag.py`:
- Around line 218-227: Add a new success-path test in
knowledge_runtime/tests/test_internal_rag.py that uses TestClient(create_app())
to POST to "/internal/rag/query" with a valid Authorization header and a payload
similar to the existing test; seed or mock the underlying retrieval/aggregation
layer (or use test fixtures) so the aggregation path runs and assert
response.status_code == 200 and response.json() contains the expected merged
records; specifically verify that records are merged correctly across sources,
sorted by relevance, truncated to any configured max_results, and that each
returned item includes the shaped fields "document_id" and "index_family" (use
the same helper/mocking utilities as other tests to locate functions interacting
with the retrieval/aggregation pipeline and validate merging, sorting, and
truncation behavior).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

Comment on lines +113 to 116
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Chain exceptions to preserve traceback for debugging.

Per the static analysis hint, exceptions raised within except clauses should use raise ... from err to preserve the original traceback, which aids debugging.

🔧 Proposed fix
-    except ValueError as e:
-        raise HTTPException(status_code=400, detail=str(e))
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e)) from e
     except Exception as e:
-        raise HTTPException(status_code=500, detail=str(e))
+        raise HTTPException(status_code=500, detail=str(e)) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e)) from e
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 114-114: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 115-115: Do not catch blind exception: Exception

(BLE001)


[warning] 116-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/rag.py` around lines 113 - 116, The except blocks
in rag.py currently re-raise HTTPException without chaining the original
exceptions, losing the original tracebacks; update both exception handlers in
the function containing these blocks to use exception chaining (e.g., raise
HTTPException(status_code=400, detail=str(e)) from e and raise
HTTPException(status_code=500, detail=str(e)) from e) so the original traceback
is preserved for debugging—locate the ValueError and generic Exception handlers
(the except ValueError as e and except Exception as e blocks) and add "from e"
to each raised HTTPException.

Comment on lines +238 to +267
@field_validator("RAG_RUNTIME_MODE", mode="before")
@classmethod
def parse_rag_runtime_mode(cls, v: Any) -> str | dict[str, str]:
"""Parse RAG runtime mode from a global value or JSON operation map."""
if v is None:
return "local"
if isinstance(v, Mapping):
return {
str(key).strip().lower(): str(value).strip().lower()
for key, value in v.items()
if str(key).strip() and str(value).strip()
}
if isinstance(v, str):
raw = v.strip()
if not raw:
return "local"
if raw.startswith("{"):
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
return raw.lower()
if isinstance(parsed, Mapping):
return {
str(key).strip().lower(): str(value).strip().lower()
for key, value in parsed.items()
if str(key).strip() and str(value).strip()
}
return raw.lower()
return v

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate RAG_RUNTIME_MODE values (local/remote) during parsing.

Line 446 documents only two valid modes, but Lines 238-267 and Lines 543-550 currently accept arbitrary strings (e.g., "remtoe"). That can silently misroute operations instead of failing fast on misconfiguration.

Suggested fix
+ALLOWED_RAG_RUNTIME_MODES = {"local", "remote"}
+
+def _normalize_rag_mode(value: Any) -> str:
+    mode = str(value).strip().lower()
+    if mode not in ALLOWED_RAG_RUNTIME_MODES:
+        raise ValueError(
+            f"Invalid RAG runtime mode: {value}. "
+            f"Expected one of: {sorted(ALLOWED_RAG_RUNTIME_MODES)}"
+        )
+    return mode
+
     `@field_validator`("RAG_RUNTIME_MODE", mode="before")
     `@classmethod`
     def parse_rag_runtime_mode(cls, v: Any) -> str | dict[str, str]:
         """Parse RAG runtime mode from a global value or JSON operation map."""
         if v is None:
             return "local"
         if isinstance(v, Mapping):
-            return {
-                str(key).strip().lower(): str(value).strip().lower()
-                for key, value in v.items()
-                if str(key).strip() and str(value).strip()
-            }
+            normalized: dict[str, str] = {}
+            for key, value in v.items():
+                key_s = str(key).strip().lower()
+                if not key_s:
+                    continue
+                normalized[key_s] = _normalize_rag_mode(value)
+            return normalized
         if isinstance(v, str):
             raw = v.strip()
             if not raw:
                 return "local"
             if raw.startswith("{"):
                 try:
                     parsed = json.loads(raw)
                 except json.JSONDecodeError:
-                    return raw.lower()
+                    return _normalize_rag_mode(raw)
                 if isinstance(parsed, Mapping):
-                    return {
-                        str(key).strip().lower(): str(value).strip().lower()
-                        for key, value in parsed.items()
-                        if str(key).strip() and str(value).strip()
-                    }
-            return raw.lower()
+                    normalized: dict[str, str] = {}
+                    for key, value in parsed.items():
+                        key_s = str(key).strip().lower()
+                        if not key_s:
+                            continue
+                        normalized[key_s] = _normalize_rag_mode(value)
+                    return normalized
+            return _normalize_rag_mode(raw)
         return v

Also applies to: 543-550

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/core/config.py` around lines 238 - 267, Update the
parse_rag_runtime_mode validator to enforce only "local" or "remote" values:
when v is None return "local"; when v is a Mapping parse keys/values to
lowercased strings but validate each resulting value is exactly "local" or
"remote" and raise ValueError for any invalid entry; when v is a string trim
and, if JSON, decode to a Mapping and validate as above, otherwise ensure the
final lowercased string equals "local" or "remote" and raise ValueError if not;
apply the same strict validation logic to the other validator referenced around
lines 543-550 so both parse/field_validator paths only accept "local" or
"remote" and fail-fast on misconfiguration.

Comment on lines +15 to +26
async def test_connection_local(
spec: ConnectionTestRuntimeSpec,
*,
db: Session | None = None,
) -> dict:
del db
storage_backend = create_storage_backend_from_runtime_config(spec.retriever_config)
success = await asyncio.to_thread(storage_backend.test_connection)
return {
"success": success,
"message": "Connection successful" if success else "Connection failed",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add OpenTelemetry tracing to this new async executor.

test_connection_local is a new async boundary and should be wrapped with @trace_async(...) for consistent observability.

📈 Proposed fix
 from app.services.rag.runtime_specs import ConnectionTestRuntimeSpec
 from app.services.rag.storage.factory import create_storage_backend_from_runtime_config
+from shared.telemetry.decorators import trace_async
@@
+@trace_async(
+    span_name="rag.test_connection_local",
+    tracer_name="backend.services.rag",
+)
 async def test_connection_local(
     spec: ConnectionTestRuntimeSpec,
     *,
     db: Session | None = None,
-) -> dict:
+) -> dict[str, bool | str]:

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/local_data_plane/administration.py` around lines 15
- 26, The new async executor test_connection_local lacks OpenTelemetry tracing;
add the `@trace_async` decorator to the async function (use trace_async with a
clear span name like "administration.test_connection_local" and the appropriate
tracer name, e.g., "backend") and supply an extract_attributes callable that
records useful inputs (e.g., spec.retriever_config or other non-sensitive
attributes) so the whole async boundary is traced; update the function
definition for test_connection_local to be decorated with trace_async
accordingly.

Comment on lines +164 to +180
def _build_index_storage_backend(
spec: IndexRuntimeSpec,
*,
db: Session,
):
if spec.retriever_config is not None:
return create_storage_backend_from_runtime_config(spec.retriever_config)

retriever = retriever_kinds_service.get_retriever(
db=db,
user_id=runtime_user_id,
name=retriever_name,
namespace=retriever_namespace,
user_id=spec.index_owner_user_id,
name=spec.retriever_name,
namespace=spec.retriever_namespace,
)
if retriever is None:
logger.warning(
"Retriever %s not found for KB %s during delete-index cleanup",
retriever_name,
knowledge_base_id,
)
return {
"status": "skipped",
"reason": "retriever_not_found",
"knowledge_id": str(knowledge_base_id),
"doc_ref": document_ref,
}
raise ValueError("retriever_not_found")
return create_storage_backend(retriever)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add return type annotations to helper functions.

Per coding guidelines, type hints are required. The helper functions _build_index_storage_backend and _build_index_embed_model are missing return type annotations.

🔧 Proposed fix
+from app.services.rag.storage.base import BaseStorageBackend
+from llama_index.core.base.embeddings.base import BaseEmbedding

 def _build_index_storage_backend(
     spec: IndexRuntimeSpec,
     *,
     db: Session,
-):
+) -> BaseStorageBackend:
 def _build_index_embed_model(
     spec: IndexRuntimeSpec,
     *,
     db: Session,
-):
+) -> BaseEmbedding:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/local_data_plane/indexing.py` around lines 164 -
180, The helper functions _build_index_storage_backend and
_build_index_embed_model are missing return type annotations; update their
signatures to include appropriate return types (e.g., the concrete storage
backend type returned by
create_storage_backend/create_storage_backend_from_runtime_config and the embed
model type returned by whatever create_embed_model/_embed_model_factory returns)
so static typing is satisfied; locate the functions named
_build_index_storage_backend and _build_index_embed_model in indexing.py and add
explicit return type hints matching the actual classes/interfaces they return
(or use a suitable Protocol/Union/typing.Any if multiple types are possible),
and update any imports if necessary to reference those types.

"parse_metadata_filters",
"build_elasticsearch_filters",
]
__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort __all__ to satisfy Ruff RUF022.

Current ordering is not isort-style and will keep triggering lint noise.

🧹 Proposed fix
-__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"]
+__all__ = ["build_elasticsearch_filters", "parse_metadata_filters"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
__all__ = ["parse_metadata_filters", "build_elasticsearch_filters"]
__all__ = ["build_elasticsearch_filters", "parse_metadata_filters"]
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 12-12: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/retrieval/__init__.py` at line 12, The __all__
export list in the module is unsorted which triggers Ruff RUF022; update the
__all__ declaration to list the exported names in sorted order, e.g., ensure
"__all__ = ['build_elasticsearch_filters', 'parse_metadata_filters']" so the
symbol names (build_elasticsearch_filters and parse_metadata_filters) are
alphabetized.

Comment thread knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py
Comment on lines +56 to +58
def create_storage_backend_from_runtime_config(
retriever_config: RuntimeRetrieverConfig,
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add return type annotation.

The function create_storage_backend_from_runtime_config is missing a return type annotation. Per coding guidelines, type hints are required.

 def create_storage_backend_from_runtime_config(
     retriever_config: RuntimeRetrieverConfig,
-):
+) -> BaseStorageBackend:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def create_storage_backend_from_runtime_config(
retriever_config: RuntimeRetrieverConfig,
):
def create_storage_backend_from_runtime_config(
retriever_config: RuntimeRetrieverConfig,
) -> BaseStorageBackend:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/factory.py` around lines 56 - 58,
The function create_storage_backend_from_runtime_config lacks a return type
hint; update its signature to include the concrete storage interface (for
example -> StorageBackend) and import that type from the module that defines the
storage backend interface (e.g., the StorageBackend or BaseStorage class in your
storage backends module). Ensure the annotation matches the actual object
returned by create_storage_backend_from_runtime_config (or use a Union of
concrete backend classes if it can return multiple types).

Comment on lines +148 to +156
RemoteQueryRecord(
content=record.get("content", ""),
title=record.get("title", "Unknown"),
score=record.get("score"),
metadata=record.get("metadata"),
knowledge_base_id=knowledge_base_id,
document_id=self._extract_document_id(record.get("metadata")),
index_family="chunk_vector",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the actual index family in RemoteQueryRecord.

index_family is hard-coded to "chunk_vector", so remote mode will mislabel summary-family hits and diverge from enabled_index_families / retrieval_policy once non-chunk retrieval is enabled. Derive this from the engine result, or reject non-chunk requests until the runtime can preserve it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 148 -
156, The RemoteQueryRecord construction currently hard-codes
index_family="chunk_vector", which mislabels non-chunk results; update where
RemoteQueryRecord is created (the RemoteQueryRecord(...) call) to derive
index_family from the engine result/record (e.g., a field on record that denotes
the family) instead of using the literal "chunk_vector", and if the
engine/record does not supply a supported index family, raise or reject the
request (or explicitly handle mapping from engine types to allowed values) so
enabled_index_families / retrieval_policy remain consistent with results.

Comment on lines +176 to +180
parsed_url = urlparse(request.content_ref.url)
file_name = PurePosixPath(parsed_url.path).name
if file_name:
suffix = PurePosixPath(file_name).suffix.lower()
return file_name, suffix
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't treat backend stream URLs as source filenames.

For backend_attachment_stream, the basename of .../content/{attachment_id} is just the attachment id, so this branch stores a meaningless numeric source_file instead of using the fallback name. That leaks poor titles into indexed metadata and downstream query results whenever presigned URLs are unavailable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 176 -
180, The code is using the URL basename (parsed_url/PurePosixPath(file_name)) as
a source filename which yields meaningless attachment IDs for
backend_attachment_stream URLs; change the branch in the function that handles
request.content_ref.url so that after extracting file_name you detect and reject
non-meaningful names (e.g., numeric IDs or paths matching
"/content/{attachment_id}" or where file_name has no extension) and instead use
the provided fallback/source name (e.g., request.content_ref.name or the
function's fallback parameter) as the source_file; still compute suffix from a
real filename when present, otherwise set suffix to an empty string or infer
from fallback, and return the chosen source_file and suffix (update any
references to parsed_url, file_name, PurePosixPath to reflect this logic).

Comment thread shared/models/knowledge_runtime_protocol.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/app/services/rag/storage/__init__.py (1)

7-19: Import ordering note.

Per PEP 8 and isort convention, imports are typically ordered: standard library → third-party packages → local application imports. Here, local imports (app.services.rag.storage.factory) appear before the knowledge_engine package imports. If knowledge_engine is treated as a third-party/external package (even if it's a sibling package in this monorepo), consider reordering.

However, if the linter/formatter doesn't flag this or if knowledge_engine is configured as a first-party package in your isort config, this is fine as-is.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/storage/__init__.py` around lines 7 - 19, The
imports in storage/__init__.py are out of conventional order; move the
knowledge_engine imports (BaseStorageBackend, ChunkMetadata,
ElasticsearchBackend, MilvusBackend, QdrantBackend) above the
app.services.rag.storage.factory imports (create_storage_backend,
create_storage_backend_from_config, create_storage_backend_from_runtime_config,
get_all_storage_retrieval_methods, get_supported_retrieval_methods,
get_supported_storage_types) so third-party packages come before local
application imports, or alternatively mark `knowledge_engine` as first-party in
your isort config if that ordering is intentional.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/app/services/rag/storage/__init__.py`:
- Around line 7-19: The imports in storage/__init__.py are out of conventional
order; move the knowledge_engine imports (BaseStorageBackend, ChunkMetadata,
ElasticsearchBackend, MilvusBackend, QdrantBackend) above the
app.services.rag.storage.factory imports (create_storage_backend,
create_storage_backend_from_config, create_storage_backend_from_runtime_config,
get_all_storage_retrieval_methods, get_supported_retrieval_methods,
get_supported_storage_types) so third-party packages come before local
application imports, or alternatively mark `knowledge_engine` as first-party in
your isort config if that ordering is intentional.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 987909b7-915c-4af7-97d2-c08cc7dbb283

📥 Commits

Reviewing files that changed from the base of the PR and between 4b1dcc6 and 5014e28.

📒 Files selected for processing (3)
  • backend/app/services/rag/embedding/__init__.py
  • backend/app/services/rag/retrieval/__init__.py
  • backend/app/services/rag/storage/__init__.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (8)
knowledge_engine/knowledge_engine/embedding/custom.py (1)

75-77: ⚠️ Potential issue | 🟡 Minor

Harden embedding extraction for malformed API payloads.

Direct indexing into nested fields can raise opaque errors and obscure upstream API issues.

Proposed change
         response.raise_for_status()
-        embedding = response.json()["data"][0]["embedding"]
+        try:
+            payload = response.json()
+            embedding = payload["data"][0]["embedding"]
+        except (KeyError, IndexError, TypeError) as exc:
+            raise ValueError(f"Malformed embedding API response: {exc}") from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/embedding/custom.py` around lines 75 - 77,
The code directly indexes response.json()["data"][0]["embedding"] which can
raise opaque KeyError/IndexError/TypeError for malformed API payloads; replace
that with a defensive extraction: call payload = response.json(), verify
isinstance(payload, dict), ensure "data" in payload and
isinstance(payload["data"], list) and len(payload["data"])>0 and
isinstance(payload["data"][0], dict) and "embedding" in payload["data"][0], then
set embedding = payload["data"][0]["embedding"]; if any check fails raise a
clear ValueError (or custom exception) that includes the full payload and a
descriptive message so upstream API issues are visible (wrap extraction in
try/except to re-raise with payload on unexpected errors).
shared/models/knowledge_runtime_protocol.py (1)

21-27: ⚠️ Potential issue | 🟠 Major

Mask transport-layer tokens in these models.

Line 26 and Line 57 keep live bearer/download tokens as plain str, so repr, debug logs, and accidental model_dump() calls can leak credentials. Use SecretStr or an excluded field here and unwrap it explicitly only at the transport edge (for example knowledge_runtime/knowledge_runtime/services/content_fetcher.py:20 and any header builders). Please verify the callers still send the raw token value after the change.

Also applies to: 53-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/models/knowledge_runtime_protocol.py` around lines 21 - 27, The
auth_token fields in BackendAttachmentStreamContentRef and the similar download
model should not be plain str (they risk leaking in repr/model_dump); change
their type to pydantic SecretStr (or an excluded field with exclude=True in the
model config) and update any code that reads the token to explicitly unwrap it
at the transport edge (e.g., in knowledge_runtime/services/content_fetcher.py
and any header builder functions) so callers still provide the raw token but it
is only revealed when creating request headers; ensure model serialization
excludes the secret by default.
backend/app/services/rag/remote_gateway.py (2)

53-65: ⚠️ Potential issue | 🟠 Major

Reject a blank internal token in __init__.

app.core.config.Settings.INTERNAL_SERVICE_TOKEN defaults to an empty string, so Lines 61-65 currently build Authorization: Bearer and turn a startup misconfig into opaque 401s. Validate self._token once in the constructor and fail fast.

Suggested fix
         self._base_url = (base_url or settings.KNOWLEDGE_RUNTIME_URL).rstrip("/")
         self._token = token if token is not None else settings.INTERNAL_SERVICE_TOKEN
+        if not self._token:
+            raise ValueError(
+                "INTERNAL_SERVICE_TOKEN must be configured for RemoteRagGateway"
+            )
         self._timeout = timeout
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 53 - 65, The
constructor __init__ of RemoteGateway currently allows an empty
INTERNAL_SERVICE_TOKEN and _build_headers then emits an Authorization: Bearer 
header causing opaque 401s; validate that the resolved token (self._token) is
not None or an empty string in __init__ (after assigning from
base_url/token/settings.INTERNAL_SERVICE_TOKEN) and raise a clear exception
(e.g., ValueError) if it's blank so the service fails fast; keep _build_headers
as-is but assume it will always have a non-empty token after this validation.

67-77: ⚠️ Potential issue | 🟠 Major

Normalize transport/schema failures before they escape the gateway.

Lines 68-77 can still raise raw httpx transport or JSON decode errors, and Line 169 can raise ValidationError. Those exceptions bypass RemoteRagGatewayError, so callers lose the gateway's failure contract and any remote→local fallback never triggers. Please verify _post_model() and query() wrap these failures and re-raise RemoteRagGatewayError.

Also applies to: 150-169

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/remote_gateway.py` around lines 67 - 77, _post_model
and query currently let raw httpx transport/decoding errors and pydantic
ValidationError escape the gateway; wrap the network/JSON/validation work in
try/except blocks and re-raise RemoteRagGatewayError so callers see the gateway
failure contract. Specifically, in _post_model, wrap the AsyncClient.post call
and the response.json() call in a try/except that catches
httpx.TransportError/httpx.HTTPError (or broader httpx.Error) and
JSONDecodeError (or httpx.DecodingError) and raise RemoteRagGatewayError(...,
cause=exc) via _raise_remote_error or by constructing RemoteRagGatewayError with
the original exception attached. Likewise, in query wrap the code that
constructs/validates the response model (the pydantic validation path that can
raise pydantic.ValidationError) and convert that into RemoteRagGatewayError so
validation failures are normalized to the gateway error contract.
backend/app/services/rag/runtime_resolver.py (2)

217-226: ⚠️ Potential issue | 🔴 Critical

Query/delete runtime specs are still loading KBs without requester scoping.

_get_knowledge_base_record() only filters by global Kind.id, and Lines 226 and 261 reuse it while building delete/query runtime specs. That lets any active KB id drive retriever/model resolution even when the caller should not have access; thread user_id through this helper and use the same access-checked lookup pattern as get_kb_index_info() or KnowledgeService.get_knowledge_base().

Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

Also applies to: 252-263, 326-340

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 217 - 226,
build_delete_runtime_spec and similar helpers call _get_knowledge_base_record
without scoping by requester user, allowing KB lookup by id alone; thread a
requester_user_id (or index_owner_user_id) through build_delete_runtime_spec,
build_query_runtime_spec, and any callers, and update _get_knowledge_base_record
to require and use the requester_user_id when resolving the Kind (or replace the
call with the same access-checked lookup used by get_kb_index_info /
KnowledgeService.get_knowledge_base) so the Kind is located using namespace,
name and user_id; apply the same change to the other affected methods that reuse
_get_knowledge_base_record (the query/delete runtime spec builders and the
resolver code around build_query_runtime_spec/build_delete_runtime_spec).

433-455: ⚠️ Potential issue | 🔴 Critical

Keep non-default model lookups tenant-scoped too.

The default branch applies the tenant/shared filter, but Lines 446-455 drop it entirely. If two tenants define the same namespace/name, this branch can resolve another tenant's model and forward its decrypted config downstream.

Suggested fix
         return (
             db.query(Kind)
             .filter(
                 Kind.kind == "Model",
                 Kind.name == model_name,
                 Kind.namespace == model_namespace,
                 Kind.is_active == True,
             )
+            .filter((Kind.user_id == user_id) | (Kind.user_id == 0))
+            .order_by(Kind.user_id.desc())
             .first()
         )

Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 433 - 455, The
non-`default` branch in runtime_resolver currently omits tenant scoping so it
can return another tenant's Model; update the query in that branch to include
the tenant/user filter so the Kind is located by the three identifying fields
(Kind.kind, Kind.name, Kind.namespace) plus Kind.user_id. Specifically, in the
code that builds the db.query(Kind) for the non-`default` case (the same place
that currently filters Kind.kind, Kind.name, Kind.namespace, Kind.is_active),
add a filter for Kind.user_id == user_id (or use the same (Kind.user_id ==
user_id) | (Kind.user_id == 0) logic if you intentionally want shared fallbacks)
and then .order_by(Kind.user_id.desc()).first() so lookups are always
tenant-scoped and uniquely identified.
knowledge_runtime/knowledge_runtime/services/handlers.py (2)

147-156: ⚠️ Potential issue | 🟠 Major

Preserve real index_family from engine records instead of hard-coding.

Line 155 hard-codes "chunk_vector", which can mislabel results once non-chunk families are returned.

🧩 Suggested mapping update
-    return [
-        RemoteQueryRecord(
-            content=record.get("content", ""),
-            title=record.get("title", "Unknown"),
-            score=record.get("score"),
-            metadata=record.get("metadata"),
-            knowledge_base_id=knowledge_base_id,
-            document_id=self._extract_document_id(record.get("metadata")),
-            index_family="chunk_vector",
-        )
-        for record in records
-    ]
+    built_records: list[RemoteQueryRecord] = []
+    for record in records:
+        metadata = record.get("metadata") if isinstance(record.get("metadata"), dict) else {}
+        index_family = record.get("index_family") or metadata.get("index_family")
+        if not index_family:
+            raise ValueError("Missing index_family in query record payload")
+        built_records.append(
+            RemoteQueryRecord(
+                content=record.get("content", ""),
+                title=record.get("title", "Unknown"),
+                score=record.get("score"),
+                metadata=record.get("metadata"),
+                knowledge_base_id=knowledge_base_id,
+                document_id=self._extract_document_id(record.get("metadata")),
+                index_family=index_family,
+            )
+        )
+    return built_records
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 147 -
156, The code is hard-coding index_family="chunk_vector" when building
RemoteQueryRecord in the handler; change it to preserve the source index family
by reading the engine record's field (e.g., record.get("index_family") or
record.get("family")) and fall back to "chunk_vector" if missing, so update the
RemoteQueryRecord construction in the handler that builds the record (the block
creating RemoteQueryRecord with document_id from _extract_document_id) to use
the record-sourced value instead of the literal "chunk_vector".

180-184: ⚠️ Potential issue | 🟡 Minor

Avoid using attachment-id URL basenames as source_file.

Lines 180-184 treat any URL basename as filename; for backend_attachment_stream paths this can become a numeric id and pollute document metadata.

🛠️ Safer filename resolution
         parsed_url = urlparse(request.content_ref.url)
         file_name = PurePosixPath(parsed_url.path).name
         if file_name:
             suffix = PurePosixPath(file_name).suffix.lower()
-            return file_name, suffix
+            if suffix or not file_name.isdigit():
+                return file_name, suffix
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 180 -
184, The code uses the URL basename (file_name from parsed_url.path) as the
source filename which can be a numeric attachment id for
backend_attachment_stream URLs; update the logic around parsed_url, file_name
and suffix so that if the path contains "backend_attachment_stream" or the
derived file_name is purely numeric (file_name.isdigit()), you do NOT treat it
as a real filename — instead return a neutral value (e.g., None or empty string)
or fall back to other metadata; otherwise keep the existing behavior of
returning file_name and suffix. Ensure the checks are applied where parsed_url,
file_name and suffix are computed so numeric/attachment-id basenames are
excluded from document metadata.
🧹 Nitpick comments (3)
knowledge_engine/tests/test_retrieval_filters.py (1)

8-24: Consider adding edge cases for completeness.

While this test covers the mixed-case scenario well, consider adding tests for:

  • Already-lowercase operators (to verify no regression)
  • Unknown/invalid operators (to document the silent-ignore behavior)
  • Empty or missing operator field (to verify the "eq" default)
💡 Example additional test cases
def test_build_elasticsearch_filters_handles_lowercase_operators() -> None:
    """Verify lowercase operators work correctly (regression guard)."""
    filters = build_elasticsearch_filters(
        "kb_1",
        {
            "operator": "and",
            "conditions": [
                {"key": "status", "operator": "eq", "value": "active"},
            ],
        },
    )
    assert {"term": {"metadata.status.keyword": "active"}} in filters


def test_build_elasticsearch_filters_ignores_unknown_operators() -> None:
    """Unknown operators should be silently ignored."""
    filters = build_elasticsearch_filters(
        "kb_1",
        {
            "operator": "and",
            "conditions": [
                {"key": "field", "operator": "UNKNOWN", "value": "val"},
            ],
        },
    )
    # Only the base knowledge_id filter should be present
    assert filters == [{"term": {"metadata.knowledge_id.keyword": "kb_1"}}]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/tests/test_retrieval_filters.py` around lines 8 - 24, Add
unit tests exercising edge cases for build_elasticsearch_filters: (1) verify
already-lowercase operators (e.g., operator "eq") produce expected filters to
guard against regressions by asserting the corresponding term filter appears;
(2) verify unknown/invalid operators are silently ignored by calling
build_elasticsearch_filters with a condition using an unknown operator and
asserting only the base knowledge_id filter is returned; and (3) verify empty or
missing condition operator defaults to "eq" by omitting the operator field (or
using empty string) and asserting a term filter is produced for that key. Use
the existing test file/test naming style and reference
build_elasticsearch_filters in each new test.
knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py (1)

459-542: Consider splitting list_documents into smaller helpers.

This method is getting large; extracting query construction / aggregation parsing / pagination into helpers would improve maintainability.

As per coding guidelines, "Function length: Max 50 lines per function (preferred)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py` around
lines 459 - 542, list_documents is too long; extract the query construction,
aggregation parsing and pagination into small helpers: create a helper
build_search_body(knowledge_id, size=0) that returns the search_body dict (used
where search_body is built), a helper parse_aggregation_buckets(aggregations)
that converts response["aggregations"]["documents"]["buckets"] into the all_docs
list (extract doc_ref, source_file, chunk_count, created_at) and a helper
paginate_documents(all_docs, page, page_size) that returns the paginated slice
and total; then have list_documents call Elasticsearch.search with
build_search_body, pass response["aggregations"] to parse_aggregation_buckets,
and pass the result to paginate_documents before returning the final dict.
knowledge_runtime/knowledge_runtime/services/handlers.py (1)

28-110: Instrument handler entrypoints with tracing decorators.

The async service handlers are currently untraced, which leaves remote execution paths under-instrumented.

As per coding guidelines, Python async functions should use @trace_async(span_name, tracer_name, extract_attributes) and span helpers (add_span_event, set_span_attribute) for OpenTelemetry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 28 -
110, The handler entrypoints index_document, query, delete_document_index, and
test_connection are missing OpenTelemetry tracing: add the `@trace_async`
decorator to each async method (use descriptive span_name like
"handlers.index_document" and appropriate tracer_name) and instrument important
steps inside each function with span helpers (call add_span_event and
set_span_attribute around key operations such as storage_backend creation,
service/QueryExecutor instantiation, and external calls like
service.index_document_from_binary, executor.execute, service.delete_document,
and storage_backend.test_connection) so traces contain attributes like
knowledge_base_id, document_id, retrieval_config, and success; ensure
extract_attributes passed to trace_async captures request metadata needed for
indexing/querying.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/endpoints/internal/rag_content.py`:
- Around line 28-29: The current _binary_stream yields a single bytes buffer
because get_attachment_binary_data() materializes the entire attachment; change
the streaming contract so get_attachment_binary_data() (in ContextService)
returns a chunked iterator or file-like object and update _binary_stream to
iterate/read from that stream in fixed-size chunks (e.g., 8KB) and yield each
chunk, and update the caller in rag_content.py to pass the new
iterator/file-like instead of a bytes object; ensure any storage backends
implement the new stream-returning method so large attachments are streamed
rather than loaded fully into memory.

In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 158-199: The code builds QueryRuntimeSpec for a KB but resolves
tenant-scoped resources using the requester's user_id; instead, resolve
retriever and embedding model configs using the KB owner (kb.user_id) so Kind
lookup uses the correct (namespace, name, user_id) tuple; update the calls to
_build_resolved_retriever_config and _build_resolved_embedding_model_config in
the QueryRuntimeSpec construction to pass user_id=kb.user_id (or an explicit
index_owner_user_id) so the resource resolution matches the KB owner rather than
the requester.

In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md`:
- Line 5: The page title currently repeats the sidebar category by starting with
"Knowledge Runtime 独立化设计"; update the title to remove the redundant "Knowledge"
prefix (e.g., change "Knowledge Runtime 独立化设计" to "Runtime 独立化设计" or another
concise non-redundant variant) so it no longer duplicates the sidebar category.

In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 58-68: The async methods _aget_query_embedding and
_aget_text_embedding and the sync method _call_api need OpenTelemetry
decorators: add `@trace_async`(...) to both _aget_query_embedding and
_aget_text_embedding and add `@trace_sync`(...) to _call_api, supplying an
appropriate span name (e.g., "embedding._aget_query_embedding",
"embedding._aget_text_embedding", "embedding._call_api"), the tracer name used
across the repo (use the same tracer_name constant used elsewhere), and the
extract_attributes function used by your tracing utilities so the full function
execution is traced according to repository standards.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 49-73: The query handler currently ignores RemoteQueryRequest
fields enabled_index_families and retrieval_policy in the knowledge_base_configs
entries; add a fail-fast validation at the start of the loop in query (or before
executor creation) that checks each kb_config.enabled_index_families and
kb_config.retrieval_policy and raises/returns an explicit error (e.g.,
ValueError or RemoteQueryResponse error) if either is set and not supported,
mentioning the offending knowledge_base_id; reference the query method and the
knowledge_base_configs / kb_config objects to locate where to insert this
validation so the runtime no longer silently accepts unsupported multi-family or
policy requests.

---

Duplicate comments:
In `@backend/app/services/rag/remote_gateway.py`:
- Around line 53-65: The constructor __init__ of RemoteGateway currently allows
an empty INTERNAL_SERVICE_TOKEN and _build_headers then emits an Authorization:
Bearer  header causing opaque 401s; validate that the resolved token
(self._token) is not None or an empty string in __init__ (after assigning from
base_url/token/settings.INTERNAL_SERVICE_TOKEN) and raise a clear exception
(e.g., ValueError) if it's blank so the service fails fast; keep _build_headers
as-is but assume it will always have a non-empty token after this validation.
- Around line 67-77: _post_model and query currently let raw httpx
transport/decoding errors and pydantic ValidationError escape the gateway; wrap
the network/JSON/validation work in try/except blocks and re-raise
RemoteRagGatewayError so callers see the gateway failure contract. Specifically,
in _post_model, wrap the AsyncClient.post call and the response.json() call in a
try/except that catches httpx.TransportError/httpx.HTTPError (or broader
httpx.Error) and JSONDecodeError (or httpx.DecodingError) and raise
RemoteRagGatewayError(..., cause=exc) via _raise_remote_error or by constructing
RemoteRagGatewayError with the original exception attached. Likewise, in query
wrap the code that constructs/validates the response model (the pydantic
validation path that can raise pydantic.ValidationError) and convert that into
RemoteRagGatewayError so validation failures are normalized to the gateway error
contract.

In `@backend/app/services/rag/runtime_resolver.py`:
- Around line 217-226: build_delete_runtime_spec and similar helpers call
_get_knowledge_base_record without scoping by requester user, allowing KB lookup
by id alone; thread a requester_user_id (or index_owner_user_id) through
build_delete_runtime_spec, build_query_runtime_spec, and any callers, and update
_get_knowledge_base_record to require and use the requester_user_id when
resolving the Kind (or replace the call with the same access-checked lookup used
by get_kb_index_info / KnowledgeService.get_knowledge_base) so the Kind is
located using namespace, name and user_id; apply the same change to the other
affected methods that reuse _get_knowledge_base_record (the query/delete runtime
spec builders and the resolver code around
build_query_runtime_spec/build_delete_runtime_spec).
- Around line 433-455: The non-`default` branch in runtime_resolver currently
omits tenant scoping so it can return another tenant's Model; update the query
in that branch to include the tenant/user filter so the Kind is located by the
three identifying fields (Kind.kind, Kind.name, Kind.namespace) plus
Kind.user_id. Specifically, in the code that builds the db.query(Kind) for the
non-`default` case (the same place that currently filters Kind.kind, Kind.name,
Kind.namespace, Kind.is_active), add a filter for Kind.user_id == user_id (or
use the same (Kind.user_id == user_id) | (Kind.user_id == 0) logic if you
intentionally want shared fallbacks) and then
.order_by(Kind.user_id.desc()).first() so lookups are always tenant-scoped and
uniquely identified.

In `@knowledge_engine/knowledge_engine/embedding/custom.py`:
- Around line 75-77: The code directly indexes
response.json()["data"][0]["embedding"] which can raise opaque
KeyError/IndexError/TypeError for malformed API payloads; replace that with a
defensive extraction: call payload = response.json(), verify isinstance(payload,
dict), ensure "data" in payload and isinstance(payload["data"], list) and
len(payload["data"])>0 and isinstance(payload["data"][0], dict) and "embedding"
in payload["data"][0], then set embedding = payload["data"][0]["embedding"]; if
any check fails raise a clear ValueError (or custom exception) that includes the
full payload and a descriptive message so upstream API issues are visible (wrap
extraction in try/except to re-raise with payload on unexpected errors).

In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 147-156: The code is hard-coding index_family="chunk_vector" when
building RemoteQueryRecord in the handler; change it to preserve the source
index family by reading the engine record's field (e.g.,
record.get("index_family") or record.get("family")) and fall back to
"chunk_vector" if missing, so update the RemoteQueryRecord construction in the
handler that builds the record (the block creating RemoteQueryRecord with
document_id from _extract_document_id) to use the record-sourced value instead
of the literal "chunk_vector".
- Around line 180-184: The code uses the URL basename (file_name from
parsed_url.path) as the source filename which can be a numeric attachment id for
backend_attachment_stream URLs; update the logic around parsed_url, file_name
and suffix so that if the path contains "backend_attachment_stream" or the
derived file_name is purely numeric (file_name.isdigit()), you do NOT treat it
as a real filename — instead return a neutral value (e.g., None or empty string)
or fall back to other metadata; otherwise keep the existing behavior of
returning file_name and suffix. Ensure the checks are applied where parsed_url,
file_name and suffix are computed so numeric/attachment-id basenames are
excluded from document metadata.

In `@shared/models/knowledge_runtime_protocol.py`:
- Around line 21-27: The auth_token fields in BackendAttachmentStreamContentRef
and the similar download model should not be plain str (they risk leaking in
repr/model_dump); change their type to pydantic SecretStr (or an excluded field
with exclude=True in the model config) and update any code that reads the token
to explicitly unwrap it at the transport edge (e.g., in
knowledge_runtime/services/content_fetcher.py and any header builder functions)
so callers still provide the raw token but it is only revealed when creating
request headers; ensure model serialization excludes the secret by default.

---

Nitpick comments:
In `@knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py`:
- Around line 459-542: list_documents is too long; extract the query
construction, aggregation parsing and pagination into small helpers: create a
helper build_search_body(knowledge_id, size=0) that returns the search_body dict
(used where search_body is built), a helper
parse_aggregation_buckets(aggregations) that converts
response["aggregations"]["documents"]["buckets"] into the all_docs list (extract
doc_ref, source_file, chunk_count, created_at) and a helper
paginate_documents(all_docs, page, page_size) that returns the paginated slice
and total; then have list_documents call Elasticsearch.search with
build_search_body, pass response["aggregations"] to parse_aggregation_buckets,
and pass the result to paginate_documents before returning the final dict.

In `@knowledge_engine/tests/test_retrieval_filters.py`:
- Around line 8-24: Add unit tests exercising edge cases for
build_elasticsearch_filters: (1) verify already-lowercase operators (e.g.,
operator "eq") produce expected filters to guard against regressions by
asserting the corresponding term filter appears; (2) verify unknown/invalid
operators are silently ignored by calling build_elasticsearch_filters with a
condition using an unknown operator and asserting only the base knowledge_id
filter is returned; and (3) verify empty or missing condition operator defaults
to "eq" by omitting the operator field (or using empty string) and asserting a
term filter is produced for that key. Use the existing test file/test naming
style and reference build_elasticsearch_filters in each new test.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py`:
- Around line 28-110: The handler entrypoints index_document, query,
delete_document_index, and test_connection are missing OpenTelemetry tracing:
add the `@trace_async` decorator to each async method (use descriptive span_name
like "handlers.index_document" and appropriate tracer_name) and instrument
important steps inside each function with span helpers (call add_span_event and
set_span_attribute around key operations such as storage_backend creation,
service/QueryExecutor instantiation, and external calls like
service.index_document_from_binary, executor.execute, service.delete_document,
and storage_backend.test_connection) so traces contain attributes like
knowledge_base_id, document_id, retrieval_config, and success; ensure
extract_attributes passed to trace_async captures request metadata needed for
indexing/querying.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f6cc0853-b0d9-4405-914d-513d7b9c19c2

📥 Commits

Reviewing files that changed from the base of the PR and between 5014e28 and 743ee7e.

📒 Files selected for processing (24)
  • backend/app/api/endpoints/internal/rag_content.py
  • backend/app/services/rag/remote_gateway.py
  • backend/app/services/rag/runtime_resolver.py
  • backend/tests/api/endpoints/internal/test_rag_content_endpoint.py
  • backend/tests/services/rag/test_remote_gateway.py
  • backend/tests/services/rag/test_runtime_resolver.py
  • docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md
  • docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md
  • knowledge_engine/knowledge_engine/embedding/custom.py
  • knowledge_engine/knowledge_engine/embedding/factory.py
  • knowledge_engine/knowledge_engine/query/executor.py
  • knowledge_engine/knowledge_engine/retrieval/filters.py
  • knowledge_engine/knowledge_engine/storage/elasticsearch_backend.py
  • knowledge_engine/tests/storage/test_elasticsearch_backend.py
  • knowledge_engine/tests/test_embedding_factory.py
  • knowledge_engine/tests/test_query_executor.py
  • knowledge_engine/tests/test_retrieval_filters.py
  • knowledge_runtime/knowledge_runtime/core/config.py
  • knowledge_runtime/knowledge_runtime/services/handlers.py
  • knowledge_runtime/tests/conftest.py
  • knowledge_runtime/tests/test_config.py
  • knowledge_runtime/tests/test_query_execution.py
  • shared/models/knowledge_runtime_protocol.py
  • shared/tests/test_knowledge_runtime_protocol.py
✅ Files skipped from review due to trivial changes (1)
  • docs/plans/2026-04-04-rag-service-extraction-implementation-plan.md
🚧 Files skipped from review as they are similar to previous changes (7)
  • knowledge_engine/knowledge_engine/retrieval/filters.py
  • knowledge_engine/tests/test_query_executor.py
  • knowledge_runtime/tests/test_query_execution.py
  • knowledge_engine/knowledge_engine/query/executor.py
  • backend/tests/api/endpoints/internal/test_rag_content_endpoint.py
  • knowledge_engine/knowledge_engine/embedding/factory.py
  • backend/tests/services/rag/test_runtime_resolver.py

Comment on lines +28 to +29
def _binary_stream(binary_data: bytes) -> Iterator[bytes]:
yield binary_data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This isn't actually streaming yet.

backend/app/services/context/context_service.py returns bytes from get_attachment_binary_data(), so Line 70 has already materialized the whole attachment and Line 85 just yields that buffer once. Large attachments will therefore spike Backend memory on the new remote-indexing path; this needs a chunked/file-like reader from the storage layer instead of wrapping a single bytes object.

Also applies to: 70-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/endpoints/internal/rag_content.py` around lines 28 - 29, The
current _binary_stream yields a single bytes buffer because
get_attachment_binary_data() materializes the entire attachment; change the
streaming contract so get_attachment_binary_data() (in ContextService) returns a
chunked iterator or file-like object and update _binary_stream to iterate/read
from that stream in fixed-size chunks (e.g., 8KB) and yield each chunk, and
update the caller in rag_content.py to pass the new iterator/file-like instead
of a bytes object; ensure any storage backends implement the new
stream-returning method so large attachments are streamed rather than loaded
fully into memory.

Comment on lines +158 to +199
kb, has_access = KnowledgeService.get_knowledge_base(
db=db,
knowledge_base_id=knowledge_base_id,
user_id=user_id,
)
if kb is None or not has_access:
raise ValueError(
f"Knowledge base {knowledge_base_id} not found or access denied"
)

return QueryRuntimeSpec(
knowledge_base_ids=[knowledge_base_id],
query=query,
max_results=max_results,
route_mode="rag_retrieval",
metadata_condition=metadata_condition,
user_id=user_id,
user_name=user_name,
knowledge_base_configs=[
QueryKnowledgeBaseRuntimeConfig(
knowledge_base_id=knowledge_base_id,
index_owner_user_id=kb.user_id,
retriever_config=self._build_resolved_retriever_config(
db=db,
user_id=user_id,
name=retriever_name,
namespace=retriever_namespace,
),
embedding_model_config=self._build_resolved_embedding_model_config(
db=db,
user_id=user_id,
model_name=embedding_model_name,
model_namespace=embedding_model_namespace,
user_name=user_name,
),
retrieval_config=RuntimeRetrievalConfig(
top_k=max_results,
score_threshold=score_threshold,
retrieval_mode=retrieval_mode,
vector_weight=vector_weight,
keyword_weight=keyword_weight,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve public-query configs with the KB owner, not the requester.

Line 179 stores kb.user_id as the index owner, but Lines 180-192 resolve both the retriever and embedding model with the requester's user_id. For shared/public KBs, a same-named tenant resource can override the KB's actual runtime config, leading to wrong backends or failed queries.

Suggested fix
                     retriever_config=self._build_resolved_retriever_config(
                         db=db,
-                        user_id=user_id,
+                        user_id=kb.user_id,
                         name=retriever_name,
                         namespace=retriever_namespace,
                     ),
                     embedding_model_config=self._build_resolved_embedding_model_config(
                         db=db,
-                        user_id=user_id,
+                        user_id=kb.user_id,
                         model_name=embedding_model_name,
                         model_namespace=embedding_model_namespace,
                         user_name=user_name,
                     ),

Based on learnings, "Kind resource is uniquely identified by THREE fields: namespace, name, and user_id - always use all three fields to locate a Kind".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/services/rag/runtime_resolver.py` around lines 158 - 199, The
code builds QueryRuntimeSpec for a KB but resolves tenant-scoped resources using
the requester's user_id; instead, resolve retriever and embedding model configs
using the KB owner (kb.user_id) so Kind lookup uses the correct (namespace,
name, user_id) tuple; update the calls to _build_resolved_retriever_config and
_build_resolved_embedding_model_config in the QueryRuntimeSpec construction to
pass user_id=kb.user_id (or an explicit index_owner_user_id) so the resource
resolution matches the KB owner rather than the requester.

sidebar_position: 1
---

# Knowledge Runtime 独立化设计
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid repeating the sidebar category in the title.

Line 5 currently starts with “Knowledge …”, which likely duplicates the sidebar category context for this page.

✏️ Suggested title tweak
-# Knowledge Runtime 独立化设计
+# Runtime 独立化设计

As per coding guidelines, docs/**/*.md: Document titles should NOT repeat the sidebar category name.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Knowledge Runtime 独立化设计
# Runtime 独立化设计
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/specs/knowledge/2026-04-03-rag-service-extraction-design.md` at line 5,
The page title currently repeats the sidebar category by starting with
"Knowledge Runtime 独立化设计"; update the title to remove the redundant "Knowledge"
prefix (e.g., change "Knowledge Runtime 独立化设计" to "Runtime 独立化设计" or another
concise non-redundant variant) so it no longer duplicates the sidebar category.

Comment on lines +58 to +68
async def _aget_query_embedding(self, query: str) -> list[float]:
return await asyncio.to_thread(self._get_query_embedding, query)

async def _aget_text_embedding(self, text: str) -> list[float]:
return await asyncio.to_thread(self._get_text_embedding, text)

@retry(
stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10)
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10),
)
def _call_api(self, text: str) -> List[float]:
"""
Call external embedding API with retry mechanism.

Args:
text: Text to embed

Returns:
Embedding vector

Raises:
requests.HTTPError: If API call fails after retries
"""
def _call_api(self, text: str) -> list[float]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add OpenTelemetry tracing decorators to changed async/sync methods.

These updated methods are still missing tracing decorators required by repository standards.

As per coding guidelines, "Use @trace_async(span_name, tracer_name, extract_attributes) decorator to trace entire async functions in OpenTelemetry" and "Use @trace_sync(span_name, tracer_name, extract_attributes) decorator to trace entire sync functions in OpenTelemetry".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_engine/knowledge_engine/embedding/custom.py` around lines 58 - 68,
The async methods _aget_query_embedding and _aget_text_embedding and the sync
method _call_api need OpenTelemetry decorators: add `@trace_async`(...) to both
_aget_query_embedding and _aget_text_embedding and add `@trace_sync`(...) to
_call_api, supplying an appropriate span name (e.g.,
"embedding._aget_query_embedding", "embedding._aget_text_embedding",
"embedding._call_api"), the tracer name used across the repo (use the same
tracer_name constant used elsewhere), and the extract_attributes function used
by your tracing utilities so the full function execution is traced according to
repository standards.

Comment on lines +49 to +73
async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
metadata_condition = self._combine_metadata_conditions(
self._build_document_filter(request.document_ids),
request.metadata_condition,
)
records: list[RemoteQueryRecord] = []

for kb_config in request.knowledge_base_configs:
storage_backend = create_storage_backend_from_runtime_config(
kb_config.retriever_config
)
embed_model = create_embedding_model_from_runtime_config(
kb_config.embedding_model_config
)
executor = QueryExecutor(
storage_backend=storage_backend,
embed_model=embed_model,
)
result = await executor.execute(
knowledge_id=str(kb_config.knowledge_base_id),
query=request.query,
retrieval_config=kb_config.retrieval_config,
metadata_condition=metadata_condition,
user_id=kb_config.index_owner_user_id,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not silently ignore enabled_index_families / retrieval_policy.

Lines 49-73 accept these contract fields but never apply or validate them, so the runtime can acknowledge policies it doesn’t execute.

🛠️ Fail-fast until multi-family/policy execution is wired
 async def query(self, request: RemoteQueryRequest) -> RemoteQueryResponse:
+    unsupported_families = [
+        family for family in request.enabled_index_families if family != "chunk_vector"
+    ]
+    if unsupported_families:
+        raise ValueError(
+            f"Unsupported enabled_index_families: {unsupported_families}"
+        )
+    if request.retrieval_policy not in (None, "chunk_only"):
+        raise ValueError(
+            f"Unsupported retrieval_policy: {request.retrieval_policy}"
+        )
+
     metadata_condition = self._combine_metadata_conditions(
         self._build_document_filter(request.document_ids),
         request.metadata_condition,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@knowledge_runtime/knowledge_runtime/services/handlers.py` around lines 49 -
73, The query handler currently ignores RemoteQueryRequest fields
enabled_index_families and retrieval_policy in the knowledge_base_configs
entries; add a fail-fast validation at the start of the loop in query (or before
executor creation) that checks each kb_config.enabled_index_families and
kb_config.retrieval_policy and raises/returns an explicit error (e.g.,
ValueError or RemoteQueryResponse error) if either is set and not supported,
mentioning the offending knowledge_base_id; reference the query method and the
knowledge_base_configs / kb_config objects to locate where to insert this
validation so the runtime no longer silently accepts unsupported multi-family or
policy requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant