Skip to content

recall: Graph expansion follows too many hops through generic/hub nodes #74

@jack-arturo

Description

@jack-arturo

Problem

Graph expansion during recall follows too many hops through generic entity nodes, pulling in memories that are connected only through shared infrastructure or common contacts — not through meaningful semantic relationships.

Example Traversal Path

Query: "Alex Panagis" with expand_entities=true

Alex Panagis (memory) 
  → entity:organizations:automem 
    → Alex Beck memories (also uses AutoMem)
      → entity:people:jack 
        → Zack Katz, Luka, Mastermind crew, every Jack-related memory

The expansion followed: person → tool → different person → shared contact → everyone. By hop 3, results have zero relevance to the original query.

Current Mitigation

The expand_min_importance and expand_min_strength params help somewhat (cut results from 40 → 16), but they filter on node properties, not on path relevance. A high-importance memory about Alex Beck is still irrelevant to an Alex Panagis query, regardless of its importance score.

Proposed Solutions

  1. Hop-depth limiting by query type: For person/entity lookups, cap expansion at 1 hop. For topic/concept queries, allow 2-3 hops. The system could infer query type from the presence of proper nouns vs. general terms.

  2. Path relevance decay: Apply a decay multiplier at each hop. If the seed result scores 0.72, the first hop should require at least ~0.5 relevance to the original query (not just to the intermediate node), second hop ~0.35, etc.

  3. Hub node detection: Identify high-connectivity "hub" nodes (like entity:organizations:automem, entity:people:jack) that connect to many unrelated memories, and deprioritize expansion through them. These nodes are structurally important but semantically promiscuous.

  4. Entity-type-aware expansion: When the query is about a person, only expand through entity:people:* nodes, not through entity:organizations:* or entity:tools:*. This prevents the tool/org bridge problem.

  5. Configurable max_hops parameter: Let callers explicitly set maximum graph traversal depth (default: 2, person queries: 1).

Related Issues

Impact

Uncontrolled expansion is the primary reason recall returns 40 results when 5 would suffice. Each irrelevant result costs context tokens and degrades the LLM's ability to synthesize a useful response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions