Skip to content

Commit 618a0dc

Browse files
SonAIengineclaude
andcommitted
feat(v0.18-α2): auto graph snapshot — CLI + MCP tool + graph.chat() priming
The G1 absorption from PLAN-v0.18: agents waste their first 1-2 turns on cold-start exploration ("what categories exist / what tables can I filter / what entities are common"). This ships a precomputed markdown snapshot of the graph that gets injected into the system prompt so the agent starts already knowing. What's new: - ``synaptic.snapshot.generate_snapshot(backend) -> str`` — markdown report covering scale, categories, top phrase hubs (mention-ranked), structured tables, edge kinds (sampled), and 1-3 sample query hints derived from the corpus shape. All stats are direct backend reads; no LLM calls. Bounded under ~1 s on a 100k-node graph (5 k entity scan cap, 50-probe phrase-hub ranking, 50-node edge sample). - ``synaptic-snapshot`` CLI (registered in pyproject.toml) — emit the same markdown to stdout or a file. Useful for previewing what a graph "looks like" before wiring up an agent. - ``knowledge_snapshot`` MCP tool — same content, exposed to MCP clients (Claude Desktop, Cursor, etc.) for one-shot graph priming. - ``SynapticGraph.chat(prime_with_snapshot=True)`` — default-on priming. Snapshot is appended ahead of any user-supplied ``extra_context``. Best-effort: snapshot failure never blocks the chat call. Measured on KRRA: 720 docs / 18.6 k chunks / 70 k entities → 0.85 s end-to-end snapshot generation. This is the only Graphify (safishamsi/graphify) absorption candidate PLAN-v0.18 §7.1 green-lit. G2 (edge confidence/provenance), G3 (Leiden community detection), G4 (multimodal converter), G5 (hyperedges) all declined as Neo4j/GraphRAG-derivative or out of scope for the v0.18 main track. Tests: 11 new unit tests in tests/test_snapshot.py, 859/859 full suite passes. ruff check + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent caeab94 commit 618a0dc

9 files changed

Lines changed: 773 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

77
## [Unreleased]
88

9+
### Added — v0.18-α2: Auto graph snapshot (Graphify G1 absorption)
10+
11+
New `synaptic.snapshot` module + `synaptic-snapshot` CLI + `knowledge_snapshot`
12+
MCP tool + opt-in priming inside `SynapticGraph.chat()`. Generates a markdown
13+
summary of a graph (scale, categories, top phrase hubs, structured tables, edge
14+
kinds, sample query hints) so an LLM agent can skip the cold-start exploration
15+
turns. Measured 0.85 s on KRRA (720 docs / 18.6k chunks / 70k entities). All
16+
stats are direct backend reads — no LLM calls; preserves the LLM-free
17+
indexing principle. `chat(prime_with_snapshot=True)` is the default and the
18+
priming is appended to `extra_context`. 11 new unit tests, all green.
19+
20+
This is the only Graphify (`safishamsi/graphify`) absorption item PLAN-v0.18
21+
green-lit (G2-G5 declined as Neo4j/GraphRAG-derivative or out of scope).
22+
923
### Changed — `agent_loop` system prompt: relative-time + multi-source guidance
1024

1125
Two new tip lines in the agent prompt, learned from the v0.18-α1-2 KRRA

docs/ROADMAP.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -59,19 +59,20 @@ GPT-4o-mini baseline 초과.
5959
| α1-3 | Agent-loop latency 감축 — 첫 1-2 turn priming 으로 탐색 turn 절약 | 🟡 G1 항목 참조 |
6060
| α1-4 | Context overflow 회피 (현재 172q 중 10q = 5.8% vLLM 16k 초과로 fail) | 🔴 미착수 |
6161

62-
### α2. G1 — Auto graph snapshot / agent priming
62+
### α2. G1 — Auto graph snapshot / agent priming ✅ ship
6363

6464
**문제**: Agent 가 cold start 시 corpus 구조 모르고 시작 → 첫 1-2 turn 이
6565
탐색에 낭비. Graphify (`safishamsi/graphify`) 의 UX 패턴 흡수.
6666

67-
| # | 작업 |
68-
|---|------|
69-
| α2-1 | `synaptic snapshot <db> --output graph.md` CLI |
70-
| α2-2 | 출력 내용: 카테고리 트리, top phrase hub (DF), entity-table 분포, edge 통계, sample queries |
71-
| α2-3 | `knowledge_snapshot()` MCP 도구 — agent 시작 시 1회, system prompt inject |
72-
| α2-4 | `graph.chat()` 기본 경로에 통합 |
67+
| # | 작업 | 상태 |
68+
|---|------|---|
69+
| α2-1 | `synaptic-snapshot <db> --output graph.md` CLI | ✅ ship — `synaptic.cli.snapshot` |
70+
| α2-2 | 출력 내용: 카테고리 트리, top phrase hub (DF), entity-table 분포, edge 통계, sample queries | ✅ ship — `synaptic.snapshot` |
71+
| α2-3 | `knowledge_snapshot()` MCP 도구 — agent 시작 시 1회, system prompt inject | ✅ ship — `mcp/server.py` |
72+
| α2-4 | `graph.chat()` 기본 경로에 통합 `prime_with_snapshot=True` (default) | ✅ ship |
7373

74-
예상 작업: 1주. G2-G5 는 [v0.18 architecture doc](PLAN-v0.18-architecture.md) 에서 격하됨.
74+
11 unit tests / 0.85 s on KRRA (720 docs / 18.6k chunks / 70k entities).
75+
G2-G5 는 [v0.18 architecture doc](PLAN-v0.18-architecture.md) 에서 격하됨.
7576

7677
### α3. OpenIE triple 실험 (MuSiQue 한계 회복)
7778

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ Changelog = "https://github.com/PlateerLab/synaptic-memory/blob/main/CHANGELOG.m
4646

4747
[project.scripts]
4848
synaptic-mcp = "synaptic.mcp.server:main"
49+
synaptic-snapshot = "synaptic.cli.snapshot:main"
4950

5051
[project.optional-dependencies]
5152
sqlite = ["aiosqlite>=0.20"]

src/synaptic/cli/__init__.py

Whitespace-only changes.

src/synaptic/cli/snapshot.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
"""``synaptic-snapshot`` CLI — generate a markdown summary of a graph.
2+
3+
Usage::
4+
5+
synaptic-snapshot path/to/graph.sqlite
6+
synaptic-snapshot path/to/graph.sqlite --output report.md
7+
synaptic-snapshot path/to/graph.sqlite --max-entities 20000
8+
9+
The output is the same markdown the ``knowledge_snapshot`` MCP tool and
10+
``graph.chat()``'s priming path emit. Use it to preview "what does my
11+
graph look like" without spinning up an agent.
12+
"""
13+
14+
from __future__ import annotations
15+
16+
import argparse
17+
import asyncio
18+
import sys
19+
from pathlib import Path
20+
21+
from synaptic import __version__
22+
23+
24+
def _build_parser() -> argparse.ArgumentParser:
25+
p = argparse.ArgumentParser(
26+
prog="synaptic-snapshot",
27+
description=(
28+
"Generate a markdown snapshot of a Synaptic Memory graph — "
29+
"scale, categories, top phrase hubs, structured tables, edge "
30+
"kinds, and sample query hints."
31+
),
32+
)
33+
p.add_argument("db", help="Path to the SQLite graph file (or :memory: for ephemeral)")
34+
p.add_argument(
35+
"-o",
36+
"--output",
37+
type=Path,
38+
default=None,
39+
help="Write the markdown to this file. Default: stdout.",
40+
)
41+
p.add_argument(
42+
"--max-entities",
43+
type=int,
44+
default=5_000,
45+
help="Cap on entity scan (default 5000). Higher = more accurate phrase-hub ranking, slower.",
46+
)
47+
p.add_argument(
48+
"--top-phrase-hubs",
49+
type=int,
50+
default=15,
51+
help="Number of phrase hubs to surface (default 15).",
52+
)
53+
p.add_argument(
54+
"--top-categories",
55+
type=int,
56+
default=30,
57+
help="Number of categories to list (default 30).",
58+
)
59+
p.add_argument(
60+
"--no-sample-queries",
61+
action="store_true",
62+
help="Omit the sample-queries section.",
63+
)
64+
p.add_argument(
65+
"--title",
66+
default="Knowledge Graph Snapshot",
67+
help="H1 heading title for the report.",
68+
)
69+
p.add_argument("--version", action="version", version=f"synaptic-snapshot {__version__}")
70+
return p
71+
72+
73+
async def _run(args: argparse.Namespace) -> str:
74+
from synaptic.backends.sqlite_graph import SqliteGraphBackend
75+
from synaptic.snapshot import generate_snapshot
76+
77+
backend = SqliteGraphBackend(args.db)
78+
await backend.connect()
79+
try:
80+
return await generate_snapshot(
81+
backend,
82+
max_entities_scanned=args.max_entities,
83+
top_n_phrase_hubs=args.top_phrase_hubs,
84+
top_n_categories=args.top_categories,
85+
include_sample_queries=not args.no_sample_queries,
86+
title=args.title,
87+
)
88+
finally:
89+
# Best-effort close — some backends raise during shutdown.
90+
close = getattr(backend, "close", None)
91+
if callable(close):
92+
try:
93+
await close()
94+
except Exception:
95+
pass
96+
97+
98+
def main(argv: list[str] | None = None) -> int:
99+
parser = _build_parser()
100+
args = parser.parse_args(argv)
101+
102+
if not Path(args.db).exists() and args.db != ":memory:":
103+
print(f"error: graph file not found: {args.db}", file=sys.stderr)
104+
return 2
105+
106+
md = asyncio.run(_run(args))
107+
108+
if args.output is None:
109+
sys.stdout.write(md)
110+
else:
111+
args.output.write_text(md, encoding="utf-8")
112+
print(f"Wrote snapshot ({len(md)} chars) to {args.output}", file=sys.stderr)
113+
return 0
114+
115+
116+
if __name__ == "__main__":
117+
raise SystemExit(main())

src/synaptic/graph.py

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1208,6 +1208,7 @@ async def chat(
12081208
max_turns: int = 5,
12091209
system_prompt: str | None = None,
12101210
extra_context: str | None = None,
1211+
prime_with_snapshot: bool = True,
12111212
):
12121213
"""Multi-turn agent loop — Synaptic's measured-strongest mode.
12131214
@@ -1234,6 +1235,13 @@ async def chat(
12341235
context is always appended.
12351236
extra_context: Additional per-corpus instructions appended
12361237
to the system prompt.
1238+
prime_with_snapshot: If True (default), inject a markdown
1239+
snapshot of the graph (categories, top phrase hubs,
1240+
tables) into the system prompt to skip the agent's
1241+
cold-start exploration turns. Set to False on very
1242+
large graphs (>100k nodes) where the snapshot overhead
1243+
approaches the saved-turn benefit, or when ``extra_context``
1244+
already provides equivalent priming.
12371245
12381246
Returns:
12391247
:class:`synaptic.agent_loop.AgentSearchResult` with
@@ -1252,6 +1260,29 @@ async def chat(
12521260
"""
12531261
from synaptic.agent_loop import run_agent_loop
12541262

1263+
# Auto-priming via graph snapshot — measured value: cold-start
1264+
# agent typically wastes turn 0 on "what categories exist /
1265+
# what tables can I filter". Snapshot preempts that probe by
1266+
# injecting the answer up front. Cheap (sub-second on 100k
1267+
# nodes), additive to the existing build_graph_context already
1268+
# done inside run_agent_loop.
1269+
priming_context = extra_context or ""
1270+
if prime_with_snapshot:
1271+
try:
1272+
from synaptic.snapshot import generate_snapshot
1273+
1274+
snapshot_md = await generate_snapshot(self._backend, include_sample_queries=True)
1275+
priming_block = (
1276+
"[Graph snapshot — already provided so you can skip "
1277+
"the usual probe turns]\n\n" + snapshot_md
1278+
)
1279+
priming_context = (
1280+
priming_block + "\n\n" + priming_context if priming_context else priming_block
1281+
)
1282+
except Exception:
1283+
# Snapshot is best-effort priming; never block the chat.
1284+
pass
1285+
12551286
return await run_agent_loop(
12561287
client=llm_client,
12571288
backend=self._backend,
@@ -1260,7 +1291,7 @@ async def chat(
12601291
max_turns=max_turns,
12611292
embedder=self._embedder,
12621293
system_prompt=system_prompt,
1263-
extra_context=extra_context,
1294+
extra_context=priming_context or None,
12641295
)
12651296

12661297
async def search(

src/synaptic/mcp/server.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,6 +345,49 @@ async def knowledge_stats() -> dict[str, Any]:
345345
return {"success": True, **{k: v for k, v in stats.items()}}
346346

347347

348+
@server.tool()
349+
async def knowledge_snapshot(
350+
max_entities: int = 5_000,
351+
top_phrase_hubs: int = 15,
352+
top_categories: int = 30,
353+
include_sample_queries: bool = True,
354+
) -> dict[str, Any]:
355+
"""Generate a markdown snapshot of the graph — for agent priming.
356+
357+
Returns a compact human-readable summary the agent can read at the
358+
start of a session to skip the usual cold-start exploration turns
359+
(probing categories / tables / entities). Sections covered:
360+
361+
- Scale (documents, chunks, phrase hubs, structured rows, edges)
362+
- Categories (with doc counts) — usable as ``deep_search(category=)``
363+
- Top phrase hubs (mention-ranked) — likely good search anchors
364+
- Tables (structured data) — for ``filter/aggregate/join`` tools
365+
- Edge types (sampled) — for ``follow``
366+
- Sample queries — 1-3 illustrative tool invocations
367+
368+
All stats are computed from direct backend reads — no LLM calls.
369+
370+
Args:
371+
max_entities: Cap on entity scan (default 5000). Higher = more
372+
accurate phrase-hub ranking on large corpora, slower.
373+
top_phrase_hubs: How many phrase hubs to surface (default 15).
374+
top_categories: How many categories to list (default 30).
375+
include_sample_queries: Append a "Sample queries" section with
376+
1-3 hint invocations derived from the corpus shape.
377+
"""
378+
graph = await _ensure_graph()
379+
from synaptic.snapshot import generate_snapshot
380+
381+
md = await generate_snapshot(
382+
graph._backend,
383+
max_entities_scanned=max_entities,
384+
top_n_phrase_hubs=top_phrase_hubs,
385+
top_n_categories=top_categories,
386+
include_sample_queries=include_sample_queries,
387+
)
388+
return {"success": True, "format": "markdown", "snapshot": md, "length": len(md)}
389+
390+
348391
@server.tool()
349392
async def knowledge_export(
350393
output_format: str = "markdown",

0 commit comments

Comments
 (0)