Skip to content

Commit f162a7a

Browse files
SonAIengineclaude
andcommitted
feat+revert(v0.22): domain enumeration in graph context; strategy text reverted
PHASE 2.1 (kept): build_graph_context now emits a "Domains in this corpus: krra (90125), x2bee (19843), assort (13909)" line whenever the corpus has 2+ distinct properties._domain_id values. Direct SQL DISTINCT (not list_nodes sampling) so the count is accurate even when nodes are stored in domain-contiguous order. PHASE 2.2 (reverted strategy text, kept enumeration): tested a 3-step strategy prompt block ("identify domains → fan-out → verify"). Re-bench cross-domain shows: v0.22 hits 3/12 (same as v0.21), but degraded partial coverage: xd001: miss → hit (assort=6 unlocked) ✓ xd008: hit → miss (assort=10 → 0) ✗ xd010: 2/3 → 1/3 partial coverage ✗ xd012: krra=83 → found=0 catastrophic ✗ Net zero hits, net negative on partial coverage — same brittle deterministic-prompt-shift dynamic seen in v0.20. Strategy block reverted; only the factual enumeration line remains (zero behavioural push, agent CAN read it but isn't being steered). Pattern (3 iterations now): adding text to AGENT_SYSTEM at temp=0/ seed=42 is a coin-flip per query. v0.19 +4, v0.20 -5 (reverted as v0.20.1 +1), v0.22 +0. Conclusion: prompt-tuning won't move cross-domain coverage. Need tool-level fan-out (Phase 2.3): modify search/deep_search to split into per-domain sub-searches internally so the agent makes one call and gets multi-domain results without behaviour change. Tests: 983 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b2f94dc commit f162a7a

2 files changed

Lines changed: 76 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,44 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

77
## [Unreleased]
88

9+
### Measured — v0.22 Phase 2.1/2.2: domain-aware prompt is NET ZERO (reverted strategy text)
10+
11+
Tried lifting cross_domain coverage by adding multi-domain awareness
12+
to the agent system prompt — both the factual enumeration of
13+
distinct ``_domain_id`` values AND a 3-step strategy block telling
14+
the agent to identify domains, fan out parallel searches, verify
15+
coverage. Same dynamic seen in v0.20: the prompt addition rerouted
16+
deterministic decoding paths and broke as many queries as it helped.
17+
18+
| qid | v0.21 baseline | **v0.22 with strategy** | delta |
19+
|---|---|---|---|
20+
| xd001 | miss (krra=112, assort=0) | **hit** (krra=136, assort=6) | flipped to hit ✓ |
21+
| xd008 | **hit** (krra=100, assort=10) | miss (krra=48, assort=0) | flipped to miss ✗ |
22+
| xd010 | partial (krra+x2bee 2/3) | worse (krra only 1/3) | regression |
23+
| xd012 | partial (krra=83) | **found=0** | catastrophic |
24+
| **total hits** | **3 / 12** | **3 / 12** | **0** |
25+
26+
Same hit count, worse partial-coverage signal on 3-domain queries.
27+
This is now the third iteration where adding text to the agent
28+
system prompt at temp=0/seed=42 has produced net-neutral or net-
29+
negative results (v0.20 cursor follow-through was −5; v0.22 domain
30+
strategy is −0 hits / −2 partial).
31+
32+
**Conclusion**: agent prompt tuning is fundamentally unreliable at
33+
deterministic sampling. Each prompt change is a coin-flip per query.
34+
The factual domain enumeration is preserved (zero behavioural risk —
35+
the agent CAN see ``Domains in this corpus: krra (90125), x2bee
36+
(19843), assort (13909)`` in the graph metadata block) but the 3-step
37+
strategy block is reverted.
38+
39+
**Phase 2.3 hypothesis** (next): tool-level fan-out instead of prompt
40+
instruction. Modify ``search``/``deep_search`` to detect a multi-
41+
domain corpus and internally split into per-domain sub-searches.
42+
Agent makes one call, gets multi-domain results without any
43+
behavioural change. The per-domain sub-search bypasses the FTS
44+
ranking bias that lets a single dominant category (KRRA's ``ESG 및
45+
지속가능성``) crowd out content from other domains.
46+
947
### Measured — v0.21 Phase 1.5/1.6: cross-domain federation bench (3/12 = 25 % baseline)
1048

1149
End-to-end demo of the Phase 1 stack — Phase 1.4 MetaCorpus combiner +

src/synaptic/search_session.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,34 @@ async def build_graph_context(backend: StorageBackend) -> str:
236236
# Total counts
237237
total_docs = await backend.count_nodes(kind=None)
238238

239+
# Per-domain breakdown (Phase 2.1) — surface distinct
240+
# ``properties._domain_id`` values so the agent can plan
241+
# cross-domain queries. Empty when the corpus has only one
242+
# (or no) domain tag — back-compat with single-domain corpora.
243+
domains_summary = ""
244+
try:
245+
# Direct SQL avoids the list_nodes(limit=10K) sampling bias
246+
# that returns one domain's worth of contiguous rows on a
247+
# MetaCorpus where domains are ordered. Falls back silently
248+
# for non-sqlite backends.
249+
db_method = getattr(backend, "_db", None)
250+
domain_counts: dict[str, int] = {}
251+
if callable(db_method):
252+
db = db_method()
253+
cur = await db.execute(
254+
"SELECT json_extract(properties_json, '$._domain_id') AS dom, COUNT(*) "
255+
"FROM syn_nodes WHERE dom IS NOT NULL GROUP BY dom ORDER BY 2 DESC LIMIT 10"
256+
)
257+
for dom, cnt in await cur.fetchall():
258+
if dom:
259+
domain_counts[dom] = int(cnt)
260+
await cur.close()
261+
if len(domain_counts) >= 2:
262+
parts = [f"{d} ({c})" for d, c in domain_counts.items()]
263+
domains_summary = ", ".join(parts)
264+
except Exception:
265+
pass
266+
239267
# Count nodes by kind to distinguish document vs structured graphs.
240268
# Structured entities are identified by the ``_table_name`` property
241269
# stamped by TableIngester / DbIngester — raw ENTITY nodes from
@@ -256,6 +284,16 @@ async def build_graph_context(backend: StorageBackend) -> str:
256284
"Use category names above as the 'category' parameter in search.",
257285
]
258286

287+
# Multi-domain corpus → just enumerate the domains. Verbose
288+
# strategy text was tested in v0.22 and net-zero on hit-rate
289+
# while degrading partial coverage on 3-domain queries — same
290+
# deterministic-prompt-shift dynamic seen at v0.20. Keeping
291+
# only the factual enumeration so the agent CAN see them
292+
# without being pushed toward a specific decoding path.
293+
if domains_summary:
294+
lines.append("")
295+
lines.append(f"Domains in this corpus: {domains_summary}")
296+
259297
# --- Structured data: table schemas ---
260298
# Detect tables from _table_name property and sample columns.
261299
structured_row_count = 0

0 commit comments

Comments
 (0)