Enhancement: Intelligent hybrid/multihop retrieval (MCP-driven) — MVP

**Goal**
Add support for multi-hop graph/semantic retrieval: a user query is analyzed by the Sequential Thinking MCP, which can now return a plan consisting of several retrieval “hops” (steps), not just a single hybrid or sequential step. Each step can be graph or semantic, and context is passed between steps (e.g., entities, file paths, or search terms), allowing the search to “teleport” between code regions, relationships, and semantic spaces.

**Problem**
Current retrieval is limited to just single-hop, hybrid, or sequential (2-hop) strategies. This blocks advanced use-cases, such as:  
- “Find all models used by endpoints that process payment, then show tests covering those models.”  
- Query: "Find payment endpoints → models they use → tests covering those models"
Step 1 (Graph): Find endpoints with "payment" 
   ↓ [extract: entity_ids of Payment models]
Step 2 (Graph): Find models used by those endpoints
   ↓ [extract: qualified_names of Model classes]  
Step 3 (Semantic): "test functions" filtered by Model entities
   ↓ [output: ranked test functions with provenance]
- “Surface functions that open database connections, then all places that call those functions, and list recent PRs touching those callers.”

**Proposed MVP**
- Integrate/extend the Sequential Thinking MCP to support returning a plan (list of retrieval steps/hops): each with type (graph/semantic), input/query, and extraction method.
- Orchestrator tool (intelligent_code_search) executes plan steps in order:
  - Context from each hop (entity IDs, file paths, tokens) is passed to the next as filters, seed queries, or boosting.
  - Each step’s results and provenance are tracked; merged output contains step-wise provenance.
  - Early stopping supported if stop condition met (e.g., “enough results”, “confidence > threshold”).
  - Fall back if MCP or any critical hop fails.
- Hybrid/parallel steps (two at same plan depth) still supported for cases where both approaches run independently.

**Acceptance Criteria**
- Given a multi-hop query, MCP returns a plan with ≥2 hops (e.g., graph→graph→semantic); orchestrator iterates, passing extracted context and producing final and intermediate results, each with step provenance.
- Given queries that only require single-hop or sequential, fallback behavior (plan of length 1 or 2) is preserved.
- Unit and integration tests exist for:  
  - plan execution (correct context chaining across >2 steps),  
  - early stopping,  
  - fallback when a hop fails,  
  - output provenance (which hop produced which result).
- Support for parallel/branching hops (if returned by MCP) is optionally stubbed out or detailed for v2.

**Implementation Plan**
1) **MCP Client + Plan Schema (2–3 hrs)**
   - sequential_thinking_mcp.py: support plan-style MCP JSON with arbitrary-length “plan” array, each entry:
     ```json
     {
       "step": 1,
       "type": "graph" | "semantic",
       "query": "natural language or cypher hint",
       "extract": "entity_ids|file_paths|symbols",
       "notes": "why this retrieval"
     }
     ```
   - Top-level MCP response also includes top-level confidence and optional stop_conditions.
   - .env.example: update as needed.
   - Unit test: mock multi-hop MCP response, validate plan extraction.

2) **Orchestrator with Multi-hop Loop (3–4 hrs)**
   - intelligent_code_search: accept plan, execute steps in order, extract and pass context, handle early stopping and fallback.
   - Preserve branch/hybrid support: parallelize hops if marked as independent.
   - Unit and integration tests: synthetic multi-hop plans, step failures, context corner-cases.

3) **Provenance/Merging + Docs (2–3 hrs)**
   - result_merger.py: combine hop results, record source+step provenance in all returned hits.
   - Small integration test: feed canonical multi-hop plan and sample codebase, assert output shape and correctness.
   - Update docs, .env, and issue test matrix for multi-hop usage.

**Files to add/modify**
- codebase_rag/tools/sequential_thinking_mcp.py — MCP client + plan schema
- codebase_rag/tools/intelligent_retrieval.py — orchestrator (now multi-hop plan/layered execution)
- codebase_rag/tools/result_merger.py — result/provenance handling
- codebase_rag/services/llm.py — MCP config
- .env.example

**MCP plan schema (proposed)**
{
  "plan": [
    {
      "step": 1,
      "type": "graph" | "semantic",
      "query": "natural language or cypher hint",
      "extract": "entity_ids|file_paths|symbols",
      "notes": "why this retrieval"
    },
    {
      "step": 2,
      "type": "graph" | "semantic",
      "query": "refined or chained query",
      "extract": "file_paths|symbols",
      "notes": ""
    }
    ...
  ],
  "confidence": 0.0-1.0,
  "stop_conditions": ["enough_results", "confidence>0.85"],
  "notes": "optional top-level reasoning"
}

**Multi-hop test examples**
- “Find classes implementing Authentication, then find all decorators applied to them, then list test functions that test those decorators.”
- “Show middleware functions that call logging, then show all error handlers that call those middleware functions.”

**Checklist**
- [ ] Add MCP client wrapper and plan schema support
- [ ] Implement orchestrator with multi-hop plan loop and context extraction/passing
- [ ] Implement provenance and merging logic
- [ ] Add unit/integration tests for multi-hop plan execution (≥3 hops, parallel hops, fallback)
- [ ] Update docs, .env, and basic test queries for multi-hop usage

**Estimated total effort:** 8–10 hours


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhancement: Intelligent hybrid/multihop retrieval (MCP-driven) — MVP #166

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Enhancement: Intelligent hybrid/multihop retrieval (MCP-driven) — MVP #166

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions