automem/: Core package. Notable dirs:api/(Flask blueprints),utils/,stores/,config.py.app.py: Flask API entry point used in local/dev and tests.tests/: Pytest suite (test_*.py), plus legacy benchmark harnesses undertests/benchmarks/.benchmarks/: Snapshot-based benchmark system. SeeEXPERIMENT_LOG.mdfor current baselines and results.scripts/bench/: Benchmark tooling (ingest, eval, compare, health check).docs/: API, testing, deployment, monitoring, and env var references.scripts/: Maintenance and ops helpers (backup, reembed, health monitor).mcp-sse-server/: Optional MCP bridge used in some deployments.
make install: Create.venv(and symlinkvenv -> .venv) and install dev deps. Prefers Python 3.12 and fails fast on incompatiblepython3.source .venv/bin/activate: Activate the virtualenv.make dev: Start local stack via Docker (FalkorDB, Qdrant, API).make test: Run unit tests (fast, no services).make test-integration: Start Docker and run full integration tests.make fmt/make lint: Format with Black/Isort and lint with Flake8.make bench-eval BENCH=locomo-mini: Run snapshot-based benchmark (~2 min). See Benchmarking section below.make deploy/make status: Deploy/check Railway. Quick health:curl :8001/health.
- Python with type hints. Indent 4 spaces; line length 100 (Black).
- Tools: Black, Isort (profile=black), Flake8; pre-commit hooks available.
- Run
pre-commit installandmake fmt && make lintbefore committing. - Naming: modules/functions
snake_case, classesPascalCase, constantsUPPER_SNAKE_CASE.
- Framework: Pytest. Place tests in
tests/namedtest_*.py. - Unit tests:
make test. - Integration:
make test-integration(requires Docker). Seedocs/TESTING.mdfor env flags and live testing options. - Add/adjust tests for new endpoints, stores, or utils; prefer fixtures over globals.
The benchmark system uses snapshot-based evaluation: ingest once, eval many times from the same snapshot. This keeps runs deterministic and fast.
Source of truth: benchmarks/EXPERIMENT_LOG.md — contains current baselines, all experiment results, and the tiered benchmark table.
automem is the canonical home for official benchmark harnesses and published benchmark numbers. Use the separate automem-evals repo for exploratory ruleset work, seeded corpora, scenario authoring, cross-agent or cross-backend comparisons, and bulky timestamped result artifacts. External eval repos should treat AutoMem as a black-box service and follow docs/EVALS_CONTRACT.md.
| Tier | Benchmark | Command | Runtime | Cost | When to use |
|---|---|---|---|---|---|
| 0 | Unit tests | make test |
30s | free | Every change |
| 1 | LoCoMo-mini (2 convos, 304 Qs) | make bench-eval BENCH=locomo-mini |
2-3 min | free / ~$0.20 with judge | Rapid iteration |
| 2 | LoCoMo-full (10 convos, 1986 Qs) | make bench-eval BENCH=locomo |
5-10 min | free / ~$1-3 with judge | Before merge |
| 3 | LongMemEval-mini (20 Qs) | make bench-mini-longmemeval |
15 min | ~$1 | Scoring/entity changes |
| 4 | LongMemEval-full (500 Qs) | make test-longmemeval |
1-2 hr | ~$10 | Milestones only |
make bench-eval BENCH=locomo-mini CONFIG=baseline— eval from snapshot (~2 min).make bench-compare BENCH=locomo CONFIG=<name> BASELINE=baseline— A/B compare two configs.make bench-compare-branch BRANCH=<branch>— compare a branch against baseline.make bench-ingest BENCH=locomo— ingest + snapshot (run once per embedding change).make bench-health— recall health check (score distribution, entity quality, latency).
- Run
make bench-eval BENCH=locomo-minionmainto confirm the current baseline. - Create a feature branch and implement changes.
- Run the same eval on the branch.
- Record both results as a new row in
benchmarks/EXPERIMENT_LOG.md. - Promote to
make bench-eval BENCH=locomo(full) before merge.
benchmarks/EXPERIMENT_LOG.md— results table and experiment metadata (committed).benchmarks/baselines/— baseline result JSONs (small files committed, large ones gitignored).benchmarks/snapshots/— Qdrant/FalkorDB snapshot data (gitignored, regenerate withmake bench-ingest).benchmarks/results/— per-run result JSONs (gitignored).scripts/bench/— shell and Python scripts driving ingest, eval, compare, and health checks.tests/benchmarks/— legacy benchmark harnesses (LoCoMo, LongMemEval) and historical result markdown files.
- PR titles must use Conventional Commit format because squash merges use the PR title as the release commit title. Do not prefix titles with
[codex],[claude],[copilot],[wip], or similar labels; put agent/status context in the PR body. - Use Conventional Commit types:
feat,fix,docs,refactor,test,ci,build,chore,perf,revert(e.g.,feat(api): add /analyze endpoint). - For public API changes, use
feat(api): ...unless the change is strictly a bug fix with no new public surface. For docs-only changes, usedocs: ...; for release automation, useci(release): ...orchore(release): .... - PRs must include: clear description and scope, linked issues, test plan/output, and notes on API or config changes. Update relevant docs under
docs/. - CI must pass; formatting/lint clean.
- Never commit secrets. Configure via env vars:
AUTOMEM_API_TOKEN,ADMIN_API_TOKEN,OPENAI_API_KEY,FALKORDB_PASSWORD,QDRANT_API_KEY. - Local dev uses Docker defaults; see
docs/ENVIRONMENT_VARIABLES.mdanddocker-compose.ymlfor ports and credentials.
Follow rules in .cursor/rules/automem.mdc for memory operations.