fix: Use ID-filtered graph projection in COT and Context Extension retrievers#2229
fix: Use ID-filtered graph projection in COT and Context Extension retrievers#2229Vasilije1990 wants to merge 1 commit intodevfrom
Conversation
…trievers COT and Context Extension retrievers always called get_triplets(query_batch=...) even for single queries, forcing batch mode in brute_force_triplet_search. Batch mode sets wide_search_limit=None, bypassing node ID extraction and causing full graph projection instead of ID-filtered projection. Add get_triplets_batch() helper that delegates to single-query mode (query=) when len(queries)==1, enabling ID-filtered graph projection. Both retrievers now use this helper so logs show "Retrieving ID-filtered graph from database" for single queries, matching Graph Completion behavior. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: vasilije <[email protected]>
WalkthroughConsolidates the triplet retrieval API by introducing and switching to a batch-oriented Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
cognee/modules/retrieval/graph_completion_retriever.py (1)
167-177: MissingParameterssection in docstringThe docstring describes the return value but omits a
Parameterssection for thequeriesargument. Per coding guidelines, function definitions without complete documentation are considered incomplete.✏️ Suggested docstring addition
""" Retrieves triplets for a list of queries, using single-query mode when possible to enable ID-filtered graph projection. + Parameters: + ----------- + - queries (List[str]): One or more query strings. A single-element list + uses single-query mode to enable ID-filtered graph projection; multiple + queries use batch mode. + When there is only one query, delegates to single-query mode (query=) which computes relevant node IDs and filters the graph projection. For multiple queries, uses batch mode (query_batch=).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cognee/modules/retrieval/graph_completion_retriever.py` around lines 167 - 177, Add a Parameters section to the docstring of the retrieval function in graph_completion_retriever.py (the function that "Retrieves triplets for a list of queries" and switches between single-query mode and batch mode) documenting the queries argument and its type/shape and any other important parameters (e.g., queries: List[str] — one query per requested result; clarify expected element type and whether None/empty lists are allowed), and include any relevant parameter behavior (single-query triggers ID-filtered graph projection). Keep wording consistent with the existing Returns section and project docstring style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cognee/modules/retrieval/graph_completion_retriever.py`:
- Around line 163-181: The method get_triplets_batch must guard the empty-input
case and normalize the heterogeneous return shape of get_triplets so the
declared return type List[List[Edge]] is satisfied for mypy: first, if queries
is empty return an empty list immediately; second, when len(queries)==1 call
get_triplets(query=...) and normalize its result so you always return a
List[List[Edge]] (if get_triplets returns List[Edge] wrap it as [result], if it
returns List[List[Edge]] use it but ensure you return exactly one inner list);
third, when calling get_triplets(query_batch=...) assert/coerce the batch result
to List[List[Edge]] (if the call yields a flat List[Edge] wrap it into a
single-item list-per-query mapping) so both branches have the same concrete type
and mypy passes. Ensure you reference get_triplets and get_triplets_batch while
making these checks and conversions.
---
Nitpick comments:
In `@cognee/modules/retrieval/graph_completion_retriever.py`:
- Around line 167-177: Add a Parameters section to the docstring of the
retrieval function in graph_completion_retriever.py (the function that
"Retrieves triplets for a list of queries" and switches between single-query
mode and batch mode) documenting the queries argument and its type/shape and any
other important parameters (e.g., queries: List[str] — one query per requested
result; clarify expected element type and whether None/empty lists are allowed),
and include any relevant parameter behavior (single-query triggers ID-filtered
graph projection). Keep wording consistent with the existing Returns section and
project docstring style.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
cognee/modules/retrieval/graph_completion_context_extension_retriever.pycognee/modules/retrieval/graph_completion_cot_retriever.pycognee/modules/retrieval/graph_completion_retriever.py
| async def get_triplets_batch( | ||
| self, | ||
| queries: List[str], | ||
| ) -> List[List[Edge]]: | ||
| """ | ||
| Retrieves triplets for a list of queries, using single-query mode when | ||
| possible to enable ID-filtered graph projection. | ||
|
|
||
| When there is only one query, delegates to single-query mode (query=) | ||
| which computes relevant node IDs and filters the graph projection. | ||
| For multiple queries, uses batch mode (query_batch=). | ||
|
|
||
| Returns: | ||
| List[List[Edge]]: One list of edges per query. | ||
| """ | ||
| if len(queries) == 1: | ||
| triplets = await self.get_triplets(query=queries[0]) | ||
| return [triplets] | ||
| return await self.get_triplets(query_batch=queries) |
There was a problem hiding this comment.
Type annotation mismatch and missing empty-list guard in get_triplets_batch
Two issues:
-
mypyincompatibility:get_tripletsis typed-> Union[List[Edge], List[List[Edge]]], so both return paths fail mypy's return-type check against the declared-> List[List[Edge]]:- Single-query branch:
[triplets]has inferred typeList[Union[List[Edge], List[List[Edge]]]]. - Multi-query branch: direct return of
Union[List[Edge], List[List[Edge]]].
- Single-query branch:
-
No guard for empty
queries: whenlen(queries) == 0, the call falls through toget_triplets(query_batch=[])whose behaviour with an empty batch is undefined inbrute_force_triplet_search. As a public method, this edge case should be defended.
🛠️ Proposed fix
+from typing import cast
async def get_triplets_batch(
self,
queries: List[str],
) -> List[List[Edge]]:
+ if not queries:
+ return []
if len(queries) == 1:
- triplets = await self.get_triplets(query=queries[0])
+ triplets = cast(List[Edge], await self.get_triplets(query=queries[0]))
return [triplets]
- return await self.get_triplets(query_batch=queries)
+ return cast(List[List[Edge]], await self.get_triplets(query_batch=queries))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cognee/modules/retrieval/graph_completion_retriever.py` around lines 163 -
181, The method get_triplets_batch must guard the empty-input case and normalize
the heterogeneous return shape of get_triplets so the declared return type
List[List[Edge]] is satisfied for mypy: first, if queries is empty return an
empty list immediately; second, when len(queries)==1 call
get_triplets(query=...) and normalize its result so you always return a
List[List[Edge]] (if get_triplets returns List[Edge] wrap it as [result], if it
returns List[List[Edge]] use it but ensure you return exactly one inner list);
third, when calling get_triplets(query_batch=...) assert/coerce the batch result
to List[List[Edge]] (if the call yields a flat List[Edge] wrap it into a
single-item list-per-query mapping) so both branches have the same concrete type
and mypy passes. Ensure you reference get_triplets and get_triplets_batch while
making these checks and conversions.
Summary
get_triplets(query_batch=...)even for single queries, forcing batch mode inbrute_force_triplet_search. Batch mode setswide_search_limit=None, which bypasses node ID extraction and causes full graph projection instead of ID-filtered projection.get_triplets_batch()helper toGraphCompletionRetrieverthat delegates to single-query mode (query=) whenlen(queries)==1, enabling ID-filtered graph projection. Both subclass retrievers now use this helper.GRAPH_COMPLETION_COTandGRAPH_COMPLETION_CONTEXT_EXTENSIONlog "Retrieving ID-filtered graph from database" instead of "Retrieving full graph", matchingGRAPH_COMPLETIONbehavior.Changes
graph_completion_retriever.pyget_triplets_batch()helper: uses single-query mode for 1 query, batch mode for multiplegraph_completion_cot_retriever.py_fetch_initial_triplets_and_contextand_merge_followup_tripletsnow useget_triplets_batch()graph_completion_context_extension_retriever.pyget_retrieved_objectsand_run_extension_roundnow useget_triplets_batch()Test plan
GRAPH_COMPLETION_COT— verify logs show "Retrieving ID-filtered graph from database"GRAPH_COMPLETION_CONTEXT_EXTENSION— verify logs show "Retrieving ID-filtered graph from database"GRAPH_COMPLETION— verify behavior unchanged🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes