Skip to content

fix: Use ID-filtered graph projection in COT and Context Extension retrievers#2229

Open
Vasilije1990 wants to merge 1 commit intodevfrom
fix/id-filtered-graph-cot-context-extension
Open

fix: Use ID-filtered graph projection in COT and Context Extension retrievers#2229
Vasilije1990 wants to merge 1 commit intodevfrom
fix/id-filtered-graph-cot-context-extension

Conversation

@Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Feb 24, 2026

Summary

  • COT and Context Extension retrievers always called get_triplets(query_batch=...) even for single queries, forcing batch mode in brute_force_triplet_search. Batch mode sets wide_search_limit=None, which bypasses node ID extraction and causes full graph projection instead of ID-filtered projection.
  • Added get_triplets_batch() helper to GraphCompletionRetriever that delegates to single-query mode (query=) when len(queries)==1, enabling ID-filtered graph projection. Both subclass retrievers now use this helper.
  • After this fix, single-query searches in GRAPH_COMPLETION_COT and GRAPH_COMPLETION_CONTEXT_EXTENSION log "Retrieving ID-filtered graph from database" instead of "Retrieving full graph", matching GRAPH_COMPLETION behavior.

Changes

File Change
graph_completion_retriever.py Added get_triplets_batch() helper: uses single-query mode for 1 query, batch mode for multiple
graph_completion_cot_retriever.py _fetch_initial_triplets_and_context and _merge_followup_triplets now use get_triplets_batch()
graph_completion_context_extension_retriever.py get_retrieved_objects and _run_extension_round now use get_triplets_batch()

Test plan

  • Run single-query search with GRAPH_COMPLETION_COT — verify logs show "Retrieving ID-filtered graph from database"
  • Run single-query search with GRAPH_COMPLETION_CONTEXT_EXTENSION — verify logs show "Retrieving ID-filtered graph from database"
  • Run single-query search with GRAPH_COMPLETION — verify behavior unchanged
  • Run batch-query search with COT/Context Extension — verify batch mode still works (full graph projection for multi-query batches)
  • Run existing retrieval tests

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Refactor
    • Internal optimization of retrieval API to use batch-oriented processing methods, improving consistency across graph completion retrieval modules.

…trievers

COT and Context Extension retrievers always called get_triplets(query_batch=...)
even for single queries, forcing batch mode in brute_force_triplet_search.
Batch mode sets wide_search_limit=None, bypassing node ID extraction and
causing full graph projection instead of ID-filtered projection.

Add get_triplets_batch() helper that delegates to single-query mode (query=)
when len(queries)==1, enabling ID-filtered graph projection. Both retrievers
now use this helper so logs show "Retrieving ID-filtered graph from database"
for single queries, matching Graph Completion behavior.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: vasilije <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Walkthrough

Consolidates the triplet retrieval API by introducing and switching to a batch-oriented get_triplets_batch method across retrieval modules, replacing parameterized get_triplets(query_batch=...) calls while preserving existing logic and behavior.

Changes

Cohort / File(s) Summary
API Addition
cognee/modules/retrieval/graph_completion_retriever.py
Introduces new public method get_triplets_batch(queries: List[str]) that delegates to single-query processing for single inputs and uses batch processing for multiple queries, returning wrapped results.
API Call Updates
cognee/modules/retrieval/graph_completion_context_extension_retriever.py, cognee/modules/retrieval/graph_completion_cot_retriever.py
Updates four call sites from get_triplets(query_batch=...) to get_triplets_batch(...) across get_retrieved_objects, _run_extension_round, _fetch_initial_triplets_and_context, and _merge_followup_triplets methods.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

run-checks

Suggested reviewers

  • hajdul88
  • lxobr
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing ID-filtered graph projection in COT and Context Extension retrievers, which is the core purpose of this pull request.
Description check ✅ Passed The description provides clear human-generated context on the problem, solution, affected files, and test plan. It covers the key issue (batch mode forcing full graph projection) and the fix (adding get_triplets_batch helper), meeting the template requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/id-filtered-graph-cot-context-extension

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cognee/modules/retrieval/graph_completion_retriever.py (1)

167-177: Missing Parameters section in docstring

The docstring describes the return value but omits a Parameters section for the queries argument. Per coding guidelines, function definitions without complete documentation are considered incomplete.

✏️ Suggested docstring addition
     """
     Retrieves triplets for a list of queries, using single-query mode when
     possible to enable ID-filtered graph projection.

+    Parameters:
+    -----------
+        - queries (List[str]): One or more query strings. A single-element list
+          uses single-query mode to enable ID-filtered graph projection; multiple
+          queries use batch mode.
+
     When there is only one query, delegates to single-query mode (query=)
     which computes relevant node IDs and filters the graph projection.
     For multiple queries, uses batch mode (query_batch=).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/modules/retrieval/graph_completion_retriever.py` around lines 167 -
177, Add a Parameters section to the docstring of the retrieval function in
graph_completion_retriever.py (the function that "Retrieves triplets for a list
of queries" and switches between single-query mode and batch mode) documenting
the queries argument and its type/shape and any other important parameters
(e.g., queries: List[str] — one query per requested result; clarify expected
element type and whether None/empty lists are allowed), and include any relevant
parameter behavior (single-query triggers ID-filtered graph projection). Keep
wording consistent with the existing Returns section and project docstring
style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/modules/retrieval/graph_completion_retriever.py`:
- Around line 163-181: The method get_triplets_batch must guard the empty-input
case and normalize the heterogeneous return shape of get_triplets so the
declared return type List[List[Edge]] is satisfied for mypy: first, if queries
is empty return an empty list immediately; second, when len(queries)==1 call
get_triplets(query=...) and normalize its result so you always return a
List[List[Edge]] (if get_triplets returns List[Edge] wrap it as [result], if it
returns List[List[Edge]] use it but ensure you return exactly one inner list);
third, when calling get_triplets(query_batch=...) assert/coerce the batch result
to List[List[Edge]] (if the call yields a flat List[Edge] wrap it into a
single-item list-per-query mapping) so both branches have the same concrete type
and mypy passes. Ensure you reference get_triplets and get_triplets_batch while
making these checks and conversions.

---

Nitpick comments:
In `@cognee/modules/retrieval/graph_completion_retriever.py`:
- Around line 167-177: Add a Parameters section to the docstring of the
retrieval function in graph_completion_retriever.py (the function that
"Retrieves triplets for a list of queries" and switches between single-query
mode and batch mode) documenting the queries argument and its type/shape and any
other important parameters (e.g., queries: List[str] — one query per requested
result; clarify expected element type and whether None/empty lists are allowed),
and include any relevant parameter behavior (single-query triggers ID-filtered
graph projection). Keep wording consistent with the existing Returns section and
project docstring style.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0e7ce15 and 7c9ee35.

📒 Files selected for processing (3)
  • cognee/modules/retrieval/graph_completion_context_extension_retriever.py
  • cognee/modules/retrieval/graph_completion_cot_retriever.py
  • cognee/modules/retrieval/graph_completion_retriever.py

Comment on lines +163 to +181
async def get_triplets_batch(
self,
queries: List[str],
) -> List[List[Edge]]:
"""
Retrieves triplets for a list of queries, using single-query mode when
possible to enable ID-filtered graph projection.

When there is only one query, delegates to single-query mode (query=)
which computes relevant node IDs and filters the graph projection.
For multiple queries, uses batch mode (query_batch=).

Returns:
List[List[Edge]]: One list of edges per query.
"""
if len(queries) == 1:
triplets = await self.get_triplets(query=queries[0])
return [triplets]
return await self.get_triplets(query_batch=queries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type annotation mismatch and missing empty-list guard in get_triplets_batch

Two issues:

  1. mypy incompatibility: get_triplets is typed -> Union[List[Edge], List[List[Edge]]], so both return paths fail mypy's return-type check against the declared -> List[List[Edge]]:

    • Single-query branch: [triplets] has inferred type List[Union[List[Edge], List[List[Edge]]]].
    • Multi-query branch: direct return of Union[List[Edge], List[List[Edge]]].
  2. No guard for empty queries: when len(queries) == 0, the call falls through to get_triplets(query_batch=[]) whose behaviour with an empty batch is undefined in brute_force_triplet_search. As a public method, this edge case should be defended.

🛠️ Proposed fix
+from typing import cast

 async def get_triplets_batch(
     self,
     queries: List[str],
 ) -> List[List[Edge]]:
+    if not queries:
+        return []
     if len(queries) == 1:
-        triplets = await self.get_triplets(query=queries[0])
+        triplets = cast(List[Edge], await self.get_triplets(query=queries[0]))
         return [triplets]
-    return await self.get_triplets(query_batch=queries)
+    return cast(List[List[Edge]], await self.get_triplets(query_batch=queries))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/modules/retrieval/graph_completion_retriever.py` around lines 163 -
181, The method get_triplets_batch must guard the empty-input case and normalize
the heterogeneous return shape of get_triplets so the declared return type
List[List[Edge]] is satisfied for mypy: first, if queries is empty return an
empty list immediately; second, when len(queries)==1 call
get_triplets(query=...) and normalize its result so you always return a
List[List[Edge]] (if get_triplets returns List[Edge] wrap it as [result], if it
returns List[List[Edge]] use it but ensure you return exactly one inner list);
third, when calling get_triplets(query_batch=...) assert/coerce the batch result
to List[List[Edge]] (if the call yields a flat List[Edge] wrap it into a
single-item list-per-query mapping) so both branches have the same concrete type
and mypy passes. Ensure you reference get_triplets and get_triplets_batch while
making these checks and conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant