Skip to content

Proposal: Write Gate server — schema + provenance enforcement for agent-authored MERGE/SET #159

@injectedfusion

Description

@injectedfusion

Problem

Once MCP_READ_ONLY is flipped to allow writes, ai-toolkit has no further constraint on what an agent puts into Memgraph. The current write-safety surface is a single regex (is_write_query) + the MCP_READ_ONLY boolean in integrations/mcp-memgraph/src/mcp_memgraph/servers/server.py. That's binary allow/deny on the verb-set.

In the agentic Knowledge-Graph-Augmented Generation (KGAG) case — autonomous agents writing to a graph-backed memory across many sessions, often multiple agents sharing a graph — two failure modes dominate:

  1. Schema drift. Agents hallucinate labels and property names per call. The same concept ends up as :Person, :person, :User, :User_profile. Required properties get missed. Issue #127 ("every relationship in KG extraction is :DIRECTED") is one symptom of this class.
  2. Absent provenance. Writes carry no source, no extraction_method, no confidence. Downstream consumers can't distinguish a parsed API response from an LLM-extracted claim from an outright hallucination.

These compound. A graph of 10k nodes from 50 agent sessions becomes unqueryable in weeks — not because the data is wrong, but because nothing constrains how it was written. See Context Rot for the decay pattern this produces.

The toolkit already has schema introspection (get_schema, get_constraint, get_index). The missing piece is schema + provenance enforcement at write time.

Proposal

Add a Write Gate server variant to ai-toolkit's MCP plugin registry. Two invariants, enforced together:

  1. Schema validation — every write is matched against a registered schema before MERGE
  2. Computed provenance — every write stamps source, extraction_method, confidence (computed, not declared), and a gate version marker

The gate is neutral machinery. Consumers decide what their schema contains and what their confidence formula is; the gate enforces the shape and computes the outputs.

Why a new server variant, not a flag on the existing server?

Valid question. The argument for a flag: simpler, smaller surface. The argument for a variant:

  • Tool surface isolation. The default server exposes run_query — a Cypher pass-through. The Write Gate exposes typed tools (write_node, write_relationship) that don't allow raw Cypher writes. Agents using the gate should not have run_query in their tool list; that defeats the enforcement. Separate servers let the operator hand out the right tool surface per agent.
  • Independent configuration. Schema registry source, formula provider, error behavior — these are gate-specific config. Bolting them onto the existing server's env var surface crowds it.
  • Backwards compatibility. Existing users running the current server see no change. The gate is opt-in via AVAILABLE_SERVERS.

The plugin registry explicitly anticipates this. In servers/__init__.py:

AVAILABLE_SERVERS: Dict[str, Dict[str, Any]] = {
    "server": { ... },
    "memgraph-experimental": { ... },
    # Future servers can be added here:
    # "hygm": {
    #     "module": "mcp_memgraph.servers.hygm",
    #     ...
    # },
}

The Write Gate registers as one more entry in that dict.

Interface spec

Three MCP tools, following the toolkit's snake_case verb-object convention.

write_node

Input:
  label: str                                    # Target node label
  merge_keys: dict[str, str|int|float|bool]     # Identity for MERGE (scalars only)
  properties: dict[str, Any]                    # Data (no protected fields allowed)
  source: str                                   # Provenance: where the claim came from
  extraction_method: str                        # Must be in FormulaProvider.allowed_extraction_methods()
  reliability: float = 0.5                      # Clamped to [0.0, 1.0] before formula

Output (success):
  {
    "status": "written",
    "label": str,                               # Canonical (may differ if remapped)
    "merge_keys": dict,
    "confidence": float,                        # In [0.0, 1.0]
    "write_gate_version": str,                  # Semver: "MAJOR.MINOR.PATCH"
    "remapped_from": str | null                 # Present iff schema remap occurred
  }

Output (error):
  {
    "status": "rejected",
    "error_code": str,             # See Error Codes table below
    "message": str,
    "details": dict                # Optional diagnostic info
  }

Emitted Cypher (reference implementation):

MERGE (n:CanonicalLabel {merge_key1: $v1, merge_key2: $v2})
SET n += $properties,
    n.confidence = $computed_confidence,
    n.source = $source,
    n.extraction_method = $extraction_method,
    n.write_gate_version = $gate_version,
    n.last_updated = datetime()
// If remapped:
SET n._schema_remap_from = $original_label

write_relationship

Input:
  type: str                                     # Relationship type (validated against schema)
  from_label: str
  from_keys: dict[str, str|int|float|bool]      # Scalars only
  to_label: str
  to_keys: dict[str, str|int|float|bool]        # Scalars only
  properties: dict[str, Any] = {}               # No protected fields
  source: str
  extraction_method: str                        # Must be in FormulaProvider.allowed_extraction_methods()
  reliability: float = 0.5                      # Clamped to [0.0, 1.0] before formula
  endpoint_policy: str = "fail_if_missing"      # or "merge_endpoints"

Output: { "status": "written" | "rejected", ... }

Endpoint resolution policy (addresses the :DIRECTED everywhere pattern from #127):

  • fail_if_missing (default): If either endpoint node doesn't exist, reject with ENDPOINT_NOT_FOUND. Agent must write the node first. Prevents silent creation of stub nodes.
  • merge_endpoints: If either endpoint is missing, MERGE it by its keys with _stub=true flag. Opt-in for ingestion-style workloads that have ordering constraints.

refresh_schema_cache

No arguments. Reloads the registered schema from its source without restarting the server. Returns count of labels loaded.

Schema registry interface

Pluggable. The gate ships one reference implementation (graph-backed :Schema nodes); consumers can provide others by implementing:

from dataclasses import dataclass, field
from typing import Protocol

@dataclass(frozen=True)
class SchemaEntry:
    label: str                                     # Canonical label
    required_properties: list[str] = field(default_factory=list)
    remaps_from: list[str] = field(default_factory=list)

class SchemaRegistry(Protocol):
    def get_entry(self, label: str) -> SchemaEntry | None: ...
    def all_canonical_labels(self) -> list[str]: ...
    def remap_target(self, label: str) -> str | None: ...
    def fallback_label(self) -> str | None: ...    # Used when policy=remap and no match

Refresh semantics: refresh_schema_cache loads a new snapshot then swaps the in-memory pointer atomically. In-flight writes complete against the pre-refresh snapshot.

Unknown-label policy: controlled by WRITE_GATE_UNKNOWN_LABEL_POLICY env var, values remap (default) or reject.

Graph-backed reference implementation reads:

MATCH (s:Schema {allowed: true})
RETURN s.name AS label,
       s.required_properties AS required_properties,
       s.absorbs AS remaps_from

Confidence formula provider

Also pluggable. The formula provider is responsible for two things: computing confidence and declaring which extraction_method values it accepts.

class FormulaProvider(Protocol):
    def allowed_extraction_methods(self) -> set[str]: ...
    def compute(self, reliability: float, extraction_method: str) -> float: ...

The gate ships a deliberately simple default so the primitive doesn't dictate ideology:

class DefaultFormulaProvider:
    WEIGHTS = {
        "api":    1.0,   # Live API / CLI / deterministic machine output
        "parsed": 0.85,  # Structured doc (YAML, JSON, HCL, Markdown frontmatter)
        "manual": 0.75,  # Human explicitly stated — capped below verified sources
        "llm":    0.60,  # LLM-extracted from unstructured text
    }
    def allowed_extraction_methods(self) -> set[str]:
        return set(self.WEIGHTS.keys())
    def compute(self, reliability: float, extraction_method: str) -> float:
        return reliability * self.WEIGHTS[extraction_method]

Gate-enforced contract around the provider:

  • reliability is clamped to [0.0, 1.0] before the provider is called
  • extraction_method must be in provider.allowed_extraction_methods() or the gate rejects with INVALID_EXTRACTION_METHOD
  • The provider's returned value must be in [0.0, 1.0]; out-of-range returns reject with FORMULA_INVALID_OUTPUT

Consumers who want richer models plug in their own provider. See "Appendix: Example formula providers" for patterns other implementations have used.

Protected fields — the gate refuses writes where properties contains any of: confidence, write_gate_version, source, extraction_method, last_updated. These are gate-computed; agents declare inputs, gate produces outputs.

Error codes

Code Meaning
SCHEMA_UNKNOWN_LABEL Label not in registry and no remap target (reject-mode only)
SCHEMA_MISSING_REQUIRED_PROPERTY Required property absent from properties + merge_keys
SCHEMA_PROTECTED_FIELD Agent attempted to set a gate-computed field
SCHEMA_SOURCE_UNAVAILABLE Registry load failed (graph unreachable, file missing, etc.)
SCHEMA_TYPE_MISMATCH Property type doesn't match declared schema type
ENDPOINT_NOT_FOUND write_relationship called with fail_if_missing and endpoint absent
INVALID_EXTRACTION_METHOD Value not in FormulaProvider.allowed_extraction_methods()
FORMULA_INVALID_OUTPUT Provider returned a value outside [0.0, 1.0]

Acceptance criteria

v1 is complete when:

  • New entry in AVAILABLE_SERVERS pointing to servers/write_gate.py
  • write_node, write_relationship, refresh_schema_cache tools registered
  • Graph-backed schema registry reference implementation + one fake/in-memory implementation for tests
  • Default confidence formula provider, with clamp + range validation + protected-fields check
  • _schema_remap_from breadcrumb written on remap (remap-mode is default via WRITE_GATE_UNKNOWN_LABEL_POLICY)
  • endpoint_policy=fail_if_missing is the default for write_relationship (covered by test row Fix MCP config. #8)
  • Every code in the Error Codes table has a test that returns it
  • Test matrix (below) passes, including the lightrag-memgraph: Every relationship in KG extraction is "DIRECTED" and not some semantic value #127 regression (row Toolkit release version bump #7)

Test matrix

Minimum set, covering the invariants:

# Input Expected
1 write_node(label="Person", merge_keys={"name":"Alice"}, properties={"age": 30}, source="test", extraction_method="manual", reliability=0.9) status=written, confidence=0.675 (0.9 × 0.75), write_gate_version set
2 Same, with properties={"confidence": 1.0} SCHEMA_PROTECTED_FIELD
3 write_node(label=":person", ...) with schema registering :Person as canonical for :person status=written, label="Person", remapped_from=":person", _schema_remap_from set on node
4 write_node(label="ZZZNonexistent", ...) in remap-mode status=written, remapped to fallback (configurable); _schema_remap_from=":ZZZNonexistent"
5 Same, in reject-mode SCHEMA_UNKNOWN_LABEL
6 write_node(label="Person", merge_keys={}, properties={}) where schema requires name SCHEMA_MISSING_REQUIRED_PROPERTY, details lists name
7 write_relationship(type="DIRECTED", ...) where :DIRECTED is not a registered type SCHEMA_UNKNOWN_LABEL (regression for #127)
8 write_relationship with missing endpoint, endpoint_policy=fail_if_missing ENDPOINT_NOT_FOUND
9 Same, with endpoint_policy=merge_endpoints status=written, endpoint created with _stub=true
10 refresh_schema_cache() after adding :Event to registry {"loaded": <N+1>}, subsequent write_node(label="Event", ...) succeeds

Follow-up (out of scope for v1)

Conflict detection (comparing existing vs incoming confidence to flag silent overwrites) and dedup / fuzzy entity resolution are the natural Phase 2 and Phase 3 additions; I'm willing to author both as follow-up issues once v1 lands. Cross-label entity identity, temporal decay in the comparator, and a MAGE in-process deployment variant are further-out options that should be discussed if there's community pull.

Related art and standards

  • W3C PROV-O: The PROV Ontology — the canonical vocabulary for provenance on the web. Recommended as the reference point for any provenance-field naming choices an implementation makes.
  • Cognee — application-layer AI memory with knowledge-engine self-improvement. Complementary: a Memgraph-native Write Gate is the missing substrate for patterns like Cognee's.
  • Issue #127 — symptom of schema drift; the Write Gate prevents the class.

Appendix: Example formula providers (informational)

The v1 default is intentionally simple. Domain-specific grading systems that slot in as alternative FormulaProvider implementations without changing the gate's interface:

  • Admiralty Code — NATO AJP-2.1 two-dimensional grading (source reliability A-F × information credibility 1-6), widely used in Cyber Threat Intelligence
  • Flat declaration — agent declares a 0-1 float, no transformation; simplest option for trusted agent pipelines
  • STIX 2.1 Confidence Scales — OASIS-standardized confidence values with mappings across several qualitative scales (DNI, Admiralty, WEP)

Reference implementation

I have a working implementation of this pattern I'll extract the v1 subset from. Happy to share offline with maintainers if useful during review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions