Let
We say
- the per-document term frequencies
$\mathrm{tf}(t, d)$ for every term$t$ in$q$ , - the per-corpus document frequencies
$\mathrm{df}(t, C)$ for every term$t$ in$q$ , - the document length
$|d|$ and the average document length$\bar{\ell}(C)$ , - a fixed set of hyperparameters
$(k_1, b, \text{title boost}, ...)$ .
Standard BM25, BM25F, and hybrid lexical-BM25 scoring are all BM25-shaped.
An ingest schedule is an ordered partition
Synaptic's ingest satisfies two engineering guarantees, baked into the graph schema:
G1. Deterministic node IDs.
There exists a function src/synaptic/extensions/cdc/state.py and
tests/test_cdc_search_regression.py for the concrete realisation.
G2. Idempotent upsert.
For any document
Let
Fix any query
Guarantee G1 ensures that each document
The theorem gives set equality: whenever two documents
dict iteration order in MemoryBackend). A strictly ordered
invariance would require an insertion-order-free tie-break (e.g.
secondary sort by $\phi(d)$). See Section 4 of the paper for
empirical characterization: ~98.5 % set agreement and ~52 % exact
order agreement on 200 Allganize RAG-ko queries,
Let
where
For queries where no tie exists at the first-relevant position,
Theorem 1 gives identical reciprocal-rank contributions. For queries
in
examples/ablation/streaming_experiment.py re-runs Theorem 1 on
Allganize RAG-ko with
-
$|\mathcal{D}| = 200$ documents, -
$|Q| = 200$ queries, -
$\Sigma_1$ = one batch of all 200 documents, -
$\Sigma_2$ = 10 shuffled batches of ~20 documents each (seed 42).
Result (locked in as of v0.16.0, 2026-04-17):
| Quantity | v0.15.x (legacy engine) | v0.16.0 (evidence engine) |
|---|---|---|
| Set-equal top-10 | 197 / 200 (98.5 %) | 197 / 200 (98.5 %) |
| Exact-ordered top-10 | 103 / 200 (51.5 %) | 192 / 200 (96.0 %) |
| Top-1 identical | 109 / 200 (54.5 %) | 200 / 200 (100 %) |
| MRR (batch) | 0.7434 | 0.9468 |
| MRR (streaming) | 0.7334 | 0.9468 |
| 0.0100 | 0.0000 |
On the v0.16.0 default engine the invariance is exact on MRR and on top-1 rank, and set equality holds at the same 98.5 % with the remaining disagreements at rank 9 or 10 (Corollary 1's tie-break regime). In other words, the theorem's set-invariance claim is vindicated; the 1.5 % remaining order drift on legacy was an artefact of the legacy scoring cascade's tie-break sensitivity, not a structural violation of the theorem.
-
No re-indexing on corpus growth. Under Theorem 1, a production index built up incrementally via
SynapticGraph.sync_from_database(dsn)returns the same set of top-$k$ hits as a fresh rebuild — so nightly reindex jobs are unnecessary for retrieval quality. -
Reproducibility under deletion. If the source DB deletes a document, guarantee G1 means the corresponding node is purged, not left stale — since the theorem applies to the final corpus
$\mathcal{D}$ after deletion, the invariance holds. -
Orthogonal to LLM-extracted graphs. Theorem 1 depends on G1/G2 only. Systems that embed LLM outputs into edges (GraphRAG, Cognee, HippoRAG) cannot guarantee G2 in general: a second LLM call on the same document may yield different relations, and so a re-ingest mutates the graph non-trivially. This is a structural reason why CDC-style streaming is hard for LLM-extracted RAG systems, not merely an engineering gap.
- The BM25-shaped assumption covers lexical BM25, BM25F, and their hybrid lexical variants. It does not cover retrieval scores that depend on a global embedding manifold re-trained per ingest (e.g. learned sparse retrievers that update tokenizer weights on every batch). Extending the theorem to such scoring functions is future work.
- Tie-break order invariance requires a
$\phi$ -ordered secondary sort. We propose this as v0.16.0 roadmap work; the current implementation exhibits the 1.5 % order gap observed above. - PPR contributions to the score depend on the graph topology
induced by ingest. For FK-driven RELATED edges the topology is a
deterministic function of the final corpus (no invariance issue);
for MENTIONS edges the DF threshold governs edge creation and is
in turn a function of the final corpus, so invariance holds.
Verifying this for every edge kind is a checklist in
tests/test_cdc_search_regression.py.