Skip to content

Add Memory capability with pluggable storage backends#179

Open
DouweM wants to merge 10 commits intomainfrom
capability/memory
Open

Add Memory capability with pluggable storage backends#179
DouweM wants to merge 10 commits intomainfrom
capability/memory

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Summary

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions.

  • MemoryStore protocol with two backends: InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence)
  • Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory
  • Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start
  • Substring-based search across keys, content, and tags (case-insensitive)
  • Spec serialization support via from_spec(backend="memory"|"file")

Closes #30

Test plan

  • 48 tests covering all code paths (MemoryEntry, InMemoryStore, FileStore, Memory capability, tool functions, instructions, protocol conformance)
  • ruff check and ruff format pass
  • pyright strict mode passes with 0 errors
  • All existing tests still pass

🤖 Generated with Claude Code

DouweM and others added 10 commits April 2, 2026 05:28
Implements a Memory capability (AbstractCapability subclass) for persistent
key-value memory across agent sessions, addressing #30.

- MemoryStore protocol with InMemoryStore (dict-based, for testing) and
  FileStore (JSON file on disk, for persistence) backends
- Five tools via get_toolset(): save_memory, recall_memory, search_memories,
  list_memories, delete_memory
- Dynamic instructions via get_instructions() that inject stored memories
  into the system prompt at run start
- Substring-based search across keys, content, and tags
- Spec serialization support (Memory.from_spec with backend="memory"|"file")
- 48 tests covering all code paths, passing lint, format, and typecheck

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Address audit findings from PR review:

- Better search: word-boundary matching with relevance scoring (count of
  matching words across key/content/tags, sorted by score descending).
  Underscores and hyphens treated as word separators.
- Memory scoping: `scope: str = 'global'` field on MemoryEntry, with
  optional `scope` parameter on `search_memories` and `list_memories`
  tools and `list_all`/`search` store methods.
- TTL/expiration: `expires_at: str | None = None` on MemoryEntry with
  `is_expired()` method. Stores filter out expired entries automatically.
  `save_memory` tool accepts optional `ttl_minutes` parameter.
- Dedup warning: when saving a memory whose key is very similar to an
  existing key (same 10-char prefix, Levenshtein distance <= 2), log a
  warning via the `pydantic_harness.memory` logger.

Tests: 48 -> 99, all passing with 100% coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…Any types

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… and FileStore

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…L, and conformance

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- personal_assistant.py: FileStore persistence, preferences, instructions injection
- study_coach.py: TTL/spaced repetition, tags, search
- coding_assistant.py: procedural memory, rules, search, delete

All examples assert on memory state and are instrumented with logfire spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +186 to +188
def get(self, key: str) -> MemoryEntry | None:
"""Retrieve a memory entry by key."""
return self._entries.get(key)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Expired entries are never cleaned up from storage

The store-level get() method (_BaseDictStore.get at line 186) returns entries regardless of expiration status. Filtering is only done in list_all, search, and the recall_memory tool. This means expired entries accumulate indefinitely in both InMemoryStore (memory leak) and FileStore (disk bloat). For short-lived processes this is fine, but long-running agents with TTL-based entries will see unbounded growth. A periodic or lazy cleanup strategy (e.g., purging expired entries on list_all/search or on a timer) would be worth considering.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #137 comment (PR was recreated)

Audit vs prior art: Memory

Worth adding now:

  • Word-boundary search with relevance scoring (substring is too primitive)
  • Memory scoping/namespaces: scope field on entries + search filtering
  • TTL/expiration: expires_at on entries
  • Dedup on save (warn if very similar key/content exists)

Follow-up opportunities:

  • Vector/embedding backends, SQLite/Redis stores, auto-summarization

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @dsfaccini in #137 comment (PR was recreated)

Claude here: We reviewed this PR and pushed several improvements. Here's what changed:

Code Quality (7 commits)

Type Safety

  • MemoryEntryDict TypedDict — replaced dict[str, Any] in to_dict/from_dict with a proper TypedDict. Eliminated all avoidable Any types (Any remains only in from_spec return Memory[Any], an unavoidable framework constraint).
  • Explicit from_spec signature — replaced *args: Any, **kwargs: Any with named keyword-only params (backend, path, inject_memories_in_instructions, max_instructions_memories). Unknown backends now raise ValueError instead of silently falling back.

Code Deduplication

  • Extracted _BaseDictStore base class — InMemoryStore and FileStore shared identical get, list_all, search methods (~40 lines of duplication). Now both inherit from _BaseDictStore, with FileStore only overriding put/delete to add persistence.

Robustness

  • Graceful FileStore._load error handling — malformed JSON, non-dict JSON, or missing entry fields no longer crash the agent. Logs a warning and starts with an empty store instead.

Style

  • Replaced all RST-style double backticks with markdown single backticks in docstrings.
  • Fixed default_factory=list[str] (a GenericAlias, not a callable) to proper form that satisfies pyright strict.

Tests (48 → 119)

Added edge case tests for:

  • _score_entry: regex metacharacters in queries, underscore/hyphen word boundaries, partial word matches, empty word list
  • _simple_similarity: edit distance boundary (exactly 3 = rejected), 9-char keys (below threshold), 10-char keys
  • format_entry: empty key, empty content
  • build_instructions: exact max boundary (overflow == 0)
  • save_memory: TTL=0 immediate expiration
  • from_spec: unknown backend raises, explicit backend='memory', forwarded options
  • FileStore._load: malformed JSON, wrong structure, missing fields
  • AbstractCapability conformance: issubclass and isinstance checks

Examples (3 scripts)

All instrumented with Logfire, assert on memory state:

  1. examples/memory/personal_assistant.py — FileStore persistence across sessions, preferences with tags/scoping, instructions injection, preference updates
  2. examples/memory/study_coach.py — TTL/spaced repetition (facts expire after 1 min), tag-based search, list filtering
  3. examples/memory/coding_assistant.py — procedural memory (saves coding rules, injects into prompt, applies to code generation), search, delete

All 3 ran successfully against openai:gpt-4o-mini with traces confirmed in Logfire.

@DouweM DouweM added this to the 2026-05 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory capability

2 participants