Add Memory capability with pluggable storage backends by DouweM · Pull Request #179 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:02:32Z

Summary

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions.

MemoryStore protocol with two backends: InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence)
Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory
Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start
Substring-based search across keys, content, and tags (case-insensitive)
Spec serialization support via from_spec(backend="memory"|"file")

Closes #30

Test plan

48 tests covering all code paths (MemoryEntry, InMemoryStore, FileStore, Memory capability, tool functions, instructions, protocol conformance)
ruff check and ruff format pass
pyright strict mode passes with 0 errors
All existing tests still pass

🤖 Generated with Claude Code

Implements a Memory capability (AbstractCapability subclass) for persistent key-value memory across agent sessions, addressing #30. - MemoryStore protocol with InMemoryStore (dict-based, for testing) and FileStore (JSON file on disk, for persistence) backends - Five tools via get_toolset(): save_memory, recall_memory, search_memories, list_memories, delete_memory - Dynamic instructions via get_instructions() that inject stored memories into the system prompt at run start - Substring-based search across keys, content, and tags - Spec serialization support (Memory.from_spec with backend="memory"|"file") - 48 tests covering all code paths, passing lint, format, and typecheck Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Address audit findings from PR review: - Better search: word-boundary matching with relevance scoring (count of matching words across key/content/tags, sorted by score descending). Underscores and hyphens treated as word separators. - Memory scoping: `scope: str = 'global'` field on MemoryEntry, with optional `scope` parameter on `search_memories` and `list_memories` tools and `list_all`/`search` store methods. - TTL/expiration: `expires_at: str | None = None` on MemoryEntry with `is_expired()` method. Stores filter out expired entries automatically. `save_memory` tool accepts optional `ttl_minutes` parameter. - Dedup warning: when saving a memory whose key is very similar to an existing key (same 10-char prefix, Levenshtein distance <= 2), log a warning via the `pydantic_harness.memory` logger. Tests: 48 -> 99, all passing with 100% coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…Any types Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… and FileStore Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… backend Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…L, and conformance Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- personal_assistant.py: FileStore persistence, preferences, instructions injection - study_coach.py: TTL/spaced repetition, tags, search - coding_assistant.py: procedural memory, rules, search, delete All examples assert on memory state and are instrumented with logfire spans. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:07:06Z

+    def get(self, key: str) -> MemoryEntry | None:
+        """Retrieve a memory entry by key."""
+        return self._entries.get(key)


🚩 Expired entries are never cleaned up from storage

The store-level get() method (_BaseDictStore.get at line 186) returns entries regardless of expiration status. Filtering is only done in list_all, search, and the recall_memory tool. This means expired entries accumulate indefinitely in both InMemoryStore (memory leak) and FileStore (disk bloat). For short-lived processes this is fine, but long-running agents with TTL-based entries will see unbounded growth. A periodic or lazy cleanup strategy (e.g., purging expired entries on list_all/search or on a timer) would be worth considering.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:07:54Z

Originally posted by @DouweM in #137 comment (PR was recreated)

Audit vs prior art: Memory

Worth adding now:

Word-boundary search with relevance scoring (substring is too primitive)
Memory scoping/namespaces: scope field on entries + search filtering
TTL/expiration: expires_at on entries
Dedup on save (warn if very similar key/content exists)

Follow-up opportunities:

Vector/embedding backends, SQLite/Redis stores, auto-summarization

DouweM · 2026-04-10T15:07:55Z

Originally posted by @dsfaccini in #137 comment (PR was recreated)

Claude here: We reviewed this PR and pushed several improvements. Here's what changed:

Code Quality (7 commits)

Type Safety

MemoryEntryDict TypedDict — replaced dict[str, Any] in to_dict/from_dict with a proper TypedDict. Eliminated all avoidable Any types (Any remains only in from_spec return Memory[Any], an unavoidable framework constraint).
Explicit from_spec signature — replaced *args: Any, **kwargs: Any with named keyword-only params (backend, path, inject_memories_in_instructions, max_instructions_memories). Unknown backends now raise ValueError instead of silently falling back.

Code Deduplication

Extracted _BaseDictStore base class — InMemoryStore and FileStore shared identical get, list_all, search methods (~40 lines of duplication). Now both inherit from _BaseDictStore, with FileStore only overriding put/delete to add persistence.

Robustness

Graceful FileStore._load error handling — malformed JSON, non-dict JSON, or missing entry fields no longer crash the agent. Logs a warning and starts with an empty store instead.

Style

Replaced all RST-style double backticks with markdown single backticks in docstrings.
Fixed default_factory=list[str] (a GenericAlias, not a callable) to proper form that satisfies pyright strict.

Tests (48 → 119)

Added edge case tests for:

_score_entry: regex metacharacters in queries, underscore/hyphen word boundaries, partial word matches, empty word list
_simple_similarity: edit distance boundary (exactly 3 = rejected), 9-char keys (below threshold), 10-char keys
format_entry: empty key, empty content
build_instructions: exact max boundary (overflow == 0)
save_memory: TTL=0 immediate expiration
from_spec: unknown backend raises, explicit backend='memory', forwarded options
FileStore._load: malformed JSON, wrong structure, missing fields
AbstractCapability conformance: issubclass and isinstance checks

Examples (3 scripts)

All instrumented with Logfire, assert on memory state:

examples/memory/personal_assistant.py — FileStore persistence across sessions, preferences with tags/scoping, instructions injection, preference updates
examples/memory/study_coach.py — TTL/spaced repetition (facts expire after 1 min), tag-based search, list filtering
examples/memory/coding_assistant.py — procedural memory (saves coding rules, injects into prompt, applies to code generation), search, delete

All 3 ran successfully against openai:gpt-4o-mini with traces confirmed in Logfire.

DouweM and others added 10 commits April 2, 2026 05:28

refactor(memory): add MemoryEntryDict TypedDict, eliminate avoidable …

d9ce688

…Any types Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

refactor(memory): extract _BaseDictStore to deduplicate InMemoryStore…

63cd254

… and FileStore Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

fix(memory): handle malformed JSON gracefully in FileStore._load

f9b1066

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

refactor(memory): make from_spec signature explicit, raise on unknown…

7ddf098

… backend Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

test(memory): add edge case tests for scoring, similarity, format, TT…

11e944c

…L, and conformance Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chore(memory): update exports and plan to reflect review changes

c9dc52c

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chore: remove settings.local.json from tracking, restore original deps

58c70a7

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM assigned dsfaccini Apr 10, 2026

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:12

DouweM added this to the 2026-05 milestone Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Memory capability with pluggable storage backends#179

Add Memory capability with pluggable storage backends#179
DouweM wants to merge 10 commits intomainfrom
capability/memory

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: Memory

Uh oh!

DouweM commented Apr 10, 2026

Code Quality (7 commits)

Type Safety

Code Deduplication

Robustness

Style

Tests (48 → 119)

Examples (3 scripts)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DouweM commented Apr 10, 2026 •

edited

Loading