Commit 9b633a1
docs: rewrite ROADMAP.md + add Backfill section to CONCEPTS.md
## docs/ROADMAP.md — full rewrite
The old file was pinned to v0.1.0 → v0.2.0 → v0.3.0 planning
from many months ago. Every task in it (PostgreSQL backend,
HNSW, MCP server, Hybrid search, PPR, etc.) has been shipped
multiple minor releases ago. The file was actively misleading —
new contributors would read "current status: v0.1.0" and think
half the features don't exist yet.
New layout:
- **Current status (v0.15.0)** — PyPI version, test count,
tool count, plus a compact table of what each v0.14.x → v0.15.0
release shipped.
- **v0.16.0 — next minor** — concrete tasks grouped into four
priority buckets:
- P1: flip `graph.search(engine=)` default from `"legacy"` to
`"evidence"`, update tests that depended on legacy stages,
bench regression check.
- P2: re-measure embedder+reranker and agent baselines (still
v0.13.0-era numbers in CLAUDE.md because v0.14.x search path
changes broke their meaning).
- P3: CDC schema drift detection — `schema_fingerprint` is
already persisted but never compared; ALTER TABLE events go
unnoticed.
- P4: PostgreSQL backend feature parity with SQLite (HNSW
persist, CDC tables, etc).
- **v0.17.0** — legacy HybridSearch removal, self-calibrating
cosine probe (replaces the last magic-number default), Oracle
/ MSSQL CDC.
- **v0.18.0+ — long-term, unconfirmed** — LLM-as-Judge bench
mode, CI bench regression gate, Doc2Query++, ColBERT late
interaction, observability, cost tracking, multi-tenant.
- **Completed (historical)** — brief pointer to the v0.1 → v0.12
history plus the v0.13 → v0.15 detail table that the old file
missed entirely.
- **Design principles** — 8 rules (was 6), added "LLM-free
indexing" and the v0.14.x lesson "silent failure is a bug".
The design-principles section now explicitly codifies the
lesson the whole v0.14.x series chased: a feature that exists
in code but has no wiring is a bug, not a feature. Future
reviewers should treat it that way.
## docs/CONCEPTS.md — new section 11 "Backfill"
Inserted between the existing CDC section (10) and the
limitations section (which renumbers from 11 → 12). Covers:
- **Why it exists** — the two silent-failure modes v0.14.x
uncovered (MCP PhraseExtractor wiring gap, empty embeddings
when embedder is added post-ingest) and why re-ingest from
source is often impractical.
- **Two passes** — embedding backfill (batch via
`embedder.embed_batch`) and phrase-hub backfill (walk nodes
with no outgoing CONTAINS, run the extractor, create hub
edges). Both idempotent, best-effort, bounded by `max_nodes`.
- **BackfillResult** — full dataclass reference.
- **Wiring preconditions** — backfill does not fabricate
missing components; user has to wire the embedder /
phrase_extractor first, then call backfill. Explicitly not a
magic "fix everything" button.
- **MCP tool** — `knowledge_backfill(scope, batch_size, max_nodes)`
with sample request/response.
- **Why phrase backfill matters** — explains how the
ChunkEntityIndex + PPR dead-path problem manifests and why
filling the hubs flips cross-document search back on.
- **Limitations** — backfill is not a CDC replacement, embedder
swap requires force re-embed (future `force=True` flag
mentioned), phrase-hub quality depends on extractor locale.
- **Design principle** — explicit note that backfill is a
*recovery* tool, not a new feature. Future silent-failure
categories can land as additional passes on the same function.
No code changes, no test impact.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>1 parent f5822aa commit 9b633a1
2 files changed
Lines changed: 300 additions & 95 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
621 | | - | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
622 | 770 | | |
623 | 771 | | |
624 | 772 | | |
| |||
0 commit comments