Reconstruction-based migration tool. Reads a MemPalace stored under
ChromaDB 0.6.x and rebuilds it as a new palace under ChromaDB 1.x.
Scope is narrow and intentionally constrained: one supported version pair, one collection name, tested against a small fixture. Semantic accuracy of the reconstructed palace is not asserted. See section 4 for the exact supported scope.
mempalace-migrator reads a ChromaDB 0.6.x SQLite database directly,
extracts the documents and metadata it can identify, and reconstructs a
new ChromaDB 1.x palace from that extracted material.
It works by reconstruction, not by in-place migration:
- The source palace is opened in SQLite read-only URI mode (
mode=ro) and is never modified. - A new, separate target palace is built from scratch using the public
ChromaDB
1.xPython client. - Extracted records are inserted via the
1.xclient API, which means the embedding function used by1.xdecides how vectors are produced.
There is no shared format between the two versions. The tool does not upgrade a database; it transcribes what it can read into a new one.
Current status: all five pipeline stages are implemented.
migrate runs the full pipeline end-to-end and writes a target palace.
analyze remains available for read-only inspection without writing.
The supported scope is narrow: one version pair, one collection name.
See section 4.
- Not an in-place upgrade tool. The source palace is never modified. A partial write to the target is rolled back automatically on pipeline failure, but a completed run cannot be undone without deleting the target directory manually.
- Not a general migration utility. Scope is hard-coded to the single version pair in section 4.
- Not a general ChromaDB conversion utility. It is hard-coded to one source structure and one target version.
- Does not commit to producing a target palace that is equivalent to the source in any semantic sense.
- Does not commit to preserving every record from the source.
- Does not commit to producing semantically accurate output, even when
the run exits with code
0. - Does not commit to terminating without error on inputs it has not been tested against.
- Not a substitute for taking a backup of the source data.
Read this section before running the tool.
- Embedding vectors are not transferred. ChromaDB
1.xrecomputes embeddings using its own embedding function. Search results in the reconstructed palace will not be identical to those in the source, even when documents are byte-identical. - Output may pass structural checks but be semantically inaccurate. The reconstructed palace may load, accept queries, and return results that differ from the source in ways the tool cannot detect. Semantic accuracy is not asserted and cannot be checked by the tool.
- Completeness is not assured. Rows that fail per-row integrity checks are excluded from the reconstruction and listed in the report. The tool continues; it does not refuse to produce a partial output.
- The tool checks what it can (PRAGMA integrity, ID uniqueness, document presence, metadata resolvability). It cannot check what it does not know to look for.
- The tool may fail on inputs that other tooling accepts. Detection
requires a manifest with a recognised
chromadb_version. Palaces without one are rejected, even if they are otherwise readable. - Tested coverage is narrow. Only the version pair listed in section 4 has been exercised. Behaviour on other versions, schemas produced by other tooling, or palaces written by patched ChromaDB builds is undefined.
- The tool refuses to run when a non-empty target directory is supplied. The target path must not exist or must be empty; any other condition causes the pipeline to abort before writing.
- Concurrent access to the source is not detected reliably. The tool refuses to run when an uncheckpointed WAL file is present, but it cannot detect a concurrent reader-writer outside that signal.
- Manifest authenticity is not checked. A forged or stale manifest will be accepted.
- Atomicity has limits. Any failure after the target directory is
created triggers a rollback (the partial directory is removed). A
completed run that exits
0cannot be undone without deleting the target directory manually. migratewrites a manifest file. On success, a file namedreconstruction-target-manifest.jsonis written inside the target directory recording provenance: source path, detected format, drawer count, chromadb version, and tool version.- Empty-dict metadata is coerced. Records whose metadata is an
empty dict are stored with
Nonemetadata in the target palace, because chromadb1.5.7rejects empty-dict metadata. The drawer count and id set are preserved; no anomaly is emitted for this coercion. This is a faithful adaptation to the pinned chromadb version, not a data-loss event. - Detection evidence is not unified with the anomaly model.
Detection uses its own
Evidence/Contradictionmodel internally. Pipeline gate failures are mirrored intoctx.anomaliesas critical anomalies, but the two structured models are not merged into a single type. inspectexits0when reconstruction is skipped. No target is written; the stages section of the report marksreconstructasskippedwithreason: no_target_path.
A run that exits with code 0 means the tool completed without raising
a critical error. It does not mean the reconstructed palace
accurately represents the source.
The tool refuses to run outside this list.
| Source ChromaDB | Target ChromaDB |
|---|---|
0.6.3 |
>=1.5.7,<2 |
The dependency pin in pyproject.toml is chromadb>=1.5.7,<2.
Detection accepts palaces whose manifest lists chromadb_version
matching the single source version above (0.6.3). Detection also
requires a manifest file (mempalace-bridge-manifest.json) in the
source palace directory containing both compatibility_line and
chromadb_version fields. Without these, the tool aborts before
extraction.
There are no plans for additional version pairs in this repository. Each new pair requires re-validation against real palaces.
These are the design constraints the codebase is held to. They are described here so that contributors and users understand why the tool behaves as it does.
- Traceability over convenience. Every excluded row, every inconsistency, and every ambiguity is recorded as a structured anomaly in the report. The tool does not silently drop data.
- Explicit reporting over silent success. Each report contains an
explicitly_not_checkedlist naming the conditions the tool does not check. Silence in the output is not an assurance. - Strict boundaries over broad support. Anything outside the
supported version pair, or below the required detection confidence,
is rejected. There is no
--forceoption. - Read-only by construction. The source database is opened in
SQLite
mode=ro. The target palace is built in a separate location. There is no codepath that writes to the source. - Failure model is documented, not improvised. Critical conditions raise and abort. Per-row issues are collected and reported. The difference is defined in code, not left to the caller.
Warning: back up the source palace before running this tool. Even though the source is opened read-only, the surrounding workflow (renames, moves, scripted cleanup) is the operator's responsibility.
Warning: do not point the target path at an existing non-empty directory. The tool refuses to write to a non-empty target. The target must not exist or must be an empty directory.
Warning: inspect the report after every run. A successful exit code is not an assurance of accuracy. Semantic accuracy is not asserted.
Install:
uv venv .venv --python 3.12
uv pip install --python .venv/bin/python -e .Available commands:
# Read-only: detect format and extract records. No target written.
.venv/bin/mempalace-migrator analyze /path/to/source-palace
.venv/bin/mempalace-migrator analyze /path/to/source-palace --json-output
# Full migration: detect > extract > transform > reconstruct > validate.
# TARGET must not exist or be empty. Partial writes are rolled back on failure.
.venv/bin/mempalace-migrator migrate /path/to/source-palace --target /path/to/new-palace
.venv/bin/mempalace-migrator migrate /path/to/source-palace --target /path/to/new-palace --json-output
# Inspect without writing: detect, extract, transform, validate (no target).
# Parity checks are listed as not-performed (no reconstruction ran).
.venv/bin/mempalace-migrator inspect /path/to/source-palace
# Re-render a JSON report saved from a previous run.
.venv/bin/mempalace-migrator report /path/to/report.jsonanalyze detects format and extracts records. No target is written.
Reads SOURCE; writes nothing to disk.
inspect runs detection, extraction, transformation, and validation
without writing a target palace. Reconstruction is skipped; parity
checks are listed as not-performed in the report. Exits 0 when no
critical anomaly is present, even when reconstruction is skipped.
Reads SOURCE; writes nothing to disk.
migrate runs all five stages and writes a new ChromaDB 1.x palace at
--target. The source palace is never modified. A manifest file
(reconstruction-target-manifest.json) is written inside the target
directory on success. See section 3 for limitations that remain even
after a run exits 0.
report re-renders an existing JSON report file as human-readable text.
No pipeline is executed. Reads one file; writes nothing.
These examples use python -m mempalace_migrator.cli.main. After
installation (see §6), this becomes .venv/bin/mempalace-migrator.
# 1. Check whether your palace is readable (read-only, no writes):
python -m mempalace_migrator.cli.main analyze /path/to/source_palace
# 2. Dry-run: detect, extract, transform, validate — no target written:
python -m mempalace_migrator.cli.main inspect /path/to/source_palace
# 3. Migrate: full pipeline, write a new palace:
python -m mempalace_migrator.cli.main migrate /path/to/source_palace --target /path/to/new_palace
# 4. Save and re-render the report:
python -m mempalace_migrator.cli.main migrate /path/to/source_palace --target /path/to/new_palace --json-output > report.json
python -m mempalace_migrator.cli.main report report.jsonDetect format and extract records. Read-only; no writes.
Reads: SOURCE directory (SQLite read-only URI mode).
Writes: nothing.
Never touches: the source palace, any existing target directory.
Required flags: none (only the SOURCE positional argument).
Global flags: --json-output, --quiet, --debug.
Report keys populated: detection, extraction, extraction_stats.
Other keys (transformation, reconstruction, validation) are null.
stages and confidence_summary are always present; they reflect which
stages executed or were not reached.
Exit codes this command may produce: 0, 1, 2, 3, 8, 10.
Detect, extract, transform, and validate without writing a target palace.
Reads: SOURCE directory (SQLite read-only URI mode).
Writes: nothing.
Never touches: the source palace, any existing directory.
Required flags: none (only the SOURCE positional argument).
Global flags: --json-output, --quiet, --debug.
Report keys populated: detection, extraction, extraction_stats,
transformation, validation.
reconstruction is null (reconstruction is skipped when no target path
is supplied; stages.reconstruct records reason: no_target_path).
stages and confidence_summary are always present.
Exit codes this command may produce: 0, 1, 2, 3, 4, 7, 8, 10.
Migrate SOURCE palace to a ChromaDB 1.x palace at TARGET.
Reads: SOURCE directory (SQLite read-only URI mode).
Writes: TARGET directory (new ChromaDB 1.x palace) plus a manifest
file TARGET/reconstruction-target-manifest.json.
Never touches: the source palace. If a failure occurs after the
target directory is created, the partial directory is removed (rollback).
Required flags: --target TARGET (the destination directory; must
not exist or must be empty).
Global flags: --json-output, --quiet, --debug.
Report keys populated: all keys in the report shape (see section 8).
Artefacts left on disk: on exit 0, the target palace directory and
TARGET/reconstruction-target-manifest.json. On any non-zero exit after
the target directory was created, the target directory is removed by
the rollback mechanism.
Exit codes this command may produce: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10.
Re-render a JSON report produced by any subcommand as text.
Reads: REPORT_FILE (a JSON file produced by a previous analyze,
inspect, or migrate run with --json-output, or saved manually).
Writes: nothing.
Never touches: any palace directory.
Required flags: none (only the REPORT_FILE positional argument).
Global flags: --json-output (re-emits the JSON report unchanged),
--quiet (suppresses output; only exit code is produced), --debug.
Exit codes this command may produce: 0, 1, 8, 9, 10. Exit 9 means
the file could not be read or was not parseable as JSON. Exit codes
2-7 are not reachable from this subcommand; it does not run a
pipeline.
Each run produces a structured report printed to stdout, or as JSON
with --json-output. The report always contains the following top-level
keys:
| Key | Contract |
|---|---|
schema_version |
Stable integer (5). External consumers may pin on this value to detect format changes. |
tool_version |
Tool version string from pyproject.toml. |
supported_version_pairs |
List of {source, target} objects reflecting the version pairs this build accepts. |
run_id |
UUID4 string. Unique per run; safe for cross-referencing logs and reports. |
started_at |
UTC ISO 8601 timestamp (seconds precision, Z suffix). |
completed_at |
UTC ISO 8601 timestamp (seconds precision, Z suffix). Always present, even on failure. |
outcome |
"success" or "failure". |
failure |
null on success. On failure: {stage, code, summary, details} object. |
input |
{source_path, target_path}. target_path is null when no --target was supplied. |
detection |
Detection result: classification, numeric confidence, source version, evidence list. null if detection did not run. |
extraction |
Extraction result: collection name, PRAGMA integrity check result, failed_rows list (with per-row reason). null if extraction did not run. |
extraction_stats |
{total_rows, parsed_rows, failed_rows, parse_rate}. null if extraction did not run. |
transformation |
Transformation summary: {drawer_count, sample_ids, metadata_keys, wing_room_counts, length_profile, dropped_count}. null if transformation did not run. |
reconstruction |
Reconstruction summary: {target_path, collection_name, imported_count, batch_size, chromadb_version, target_manifest_path}. null if reconstruction did not run (e.g. analyze, inspect). |
validation |
Validation result: {outcome, confidence_band, checks_not_performed, outcomes}. null if validation did not run. |
stages |
Per-stage status map: each stage is executed, aborted, skipped, or not_run. |
confidence_summary |
{detection_band, extraction_band, overall_band}. Reflects the weakest confidence band observed across all stages that ran. |
anomalies |
List of structured anomaly objects. Each has type (registered enum value), severity (low/medium/high/critical), location.stage, message, and evidence list. Always present; may be empty. |
anomaly_summary |
{by_severity, by_stage, top_severity, total_count}. Always present. |
explicitly_not_checked |
List of condition strings naming checks the tool does not perform. Always present, always non-empty. |
Operators are expected to inspect the report. A non-empty
failed_rows list means data was excluded from the reconstruction. A
non-empty anomalies list with severity >= high means the run
contains conditions that warrant manual review before the output is
trusted for any purpose.
| Code | Trigger |
|---|---|
0 |
Pipeline completed without raising a critical error; no CRITICAL anomaly recorded |
1 |
CLI usage error (bad arguments, missing required path) |
2 |
Detection failed (unsupported format, version, or insufficient confidence) |
3 |
Extraction failed at a critical pre-flight check (PRAGMA failure, WAL not checkpointed) |
4 |
Transformation failed (extracted data missing or transformation raised) |
5 |
Reconstruction failed (target path conflict, chromadb write error, or rollback triggered) |
6 |
Report-builder pipeline error (MigratorError from the report stage) |
7 |
Validation raised unexpectedly (validate() normally never raises) |
8 |
Outcome is success but at least one CRITICAL anomaly was recorded ("silent failure" guard) |
9 |
report subcommand: the specified file could not be read or is not parseable as JSON |
10 |
Unexpected or unrecognised failure; use --debug to surface the traceback |
stdout contains only the text or JSON report. Capture it with shell redirection:
python -m mempalace_migrator.cli.main migrate SOURCE --target TARGET --json-output > report.jsonstderr contains all error banners and pipeline messages. They follow
the pattern [migrator:<run_id>] [<stage>] ERROR: <summary>.
To separate them in a script:
python -m mempalace_migrator.cli.main analyze SOURCE --json-output \
> report.json \
2> errors.txt
echo "exit: $?"Exit codes are stable across releases and safe to match on in scripts. See the exit-code table above.
This tool is intended for operators who:
- understand the structural differences between ChromaDB
0.6.xand1.x, - can read SQLite directly to check what the tool reports,
- accept that semantic accuracy of the output is not asserted and that the reconstructed palace may need to be discarded after inspection,
- do not need a turnkey upgrade path.
If you need a supported migration product, this is not it.
The following properties are enforced by the test suite. This list is closed: if a property is absent from this table, the project does not commit to it.
| Property | Enforced by |
|---|---|
| Source file bytes are unchanged after any subcommand | tests/test_migrate_e2e.py::test_source_unchanged |
Target directory is rolled back (removed) on any failure after mkdir |
tests/adversarial/test_reconstruction_rollback.py |
Every skipped check carries a SkippedReason |
validation/_types.py::SkippedReason + tests/test_validation_parity.py |
Exit code 0 implies no CRITICAL anomaly in the report |
cli/main.py::_decide_exit_code + tests/test_cli_migrate.py, tests/test_cli.py |
Report schema_version is a stable integer (currently 5) |
reporting/report_builder.py::REPORT_SCHEMA_VERSION + tests/adversarial/_invariants.py::check_schema_stability |
| Detection accepts only the single documented source/target pair | detection/format_detector.py::SUPPORTED_VERSION_PAIRS + tests/test_format_detector_structured_outputs.py |
| Reconstruction never writes to the source palace | SQLite mode=ro URI + tests/test_cli_migrate.py::test_migrate_source_byte_identical |
Every anomaly has a registered AnomalyType, a known stage, and at least one evidence entry |
core/context.py::AnomalyType + tests/adversarial/_invariants.py::check_anomaly_well_formedness |
The following are not in scope and are not committed to:
- Retrieval-result parity between source and target palaces.
- Usage-scenario parity, MCP-runtime parity, or application-level equivalence.
- Embedding-vector numeric equivalence (chromadb
1.xre-derives embeddings; only embedding presence is checked, and only as a best-effortmedium-severity check). - Semantic accuracy or completeness under corruption.
- Performance on inputs significantly larger than the test fixture.
Every pull request against main must pass the verify job defined in
.github/workflows/ci.yml before merging. The job runs on
ubuntu-latest with Python 3.12, installs the package via
pip install -e ".[dev]", executes the full test suite with pytest -q,
checks each subcommand's --help exit code, and runs the end-to-end
migration smoke test. No step may proceed if a prior step fails.
Branch protection on main requires the verify check to pass;
pull requests are not mergeable while the check is absent or red,
except by explicit admin override.
All notable changes are recorded in CHANGELOG.md.
Released versions are published as
GitHub Releases.
This project follows Semantic Versioning 2.0.0; while the major version is 0,
MINOR bumps may break the public contract (CLI surface, exit codes, report schema).
- mempalace-mcp-bridge
— the stable bridge between MemPalace and the Model Context Protocol.
It is the production-oriented project.
mempalace-migratorexists separately so that experimental reconstruction work does not affect the bridge's stability or its supported scope.