Repository Guidelines

Project Structure & Modules

automem/: Core package. Notable dirs: api/ (Flask blueprints), utils/, stores/, config.py.
app.py: Flask API entry point used in local/dev and tests.
tests/: Pytest suite (test_*.py), plus legacy benchmark harnesses under tests/benchmarks/.
benchmarks/: Snapshot-based benchmark system. See EXPERIMENT_LOG.md for current baselines and results.
scripts/bench/: Benchmark tooling (ingest, eval, compare, health check).
docs/: API, testing, deployment, monitoring, and env var references.
scripts/: Maintenance and ops helpers (backup, reembed, health monitor).
mcp-sse-server/: Optional MCP bridge used in some deployments.

Build, Test, and Development

make install: Create .venv (and symlink venv -> .venv) and install dev deps. Prefers Python 3.12 and fails fast on incompatible python3.
source .venv/bin/activate: Activate the virtualenv.
make dev: Start local stack via Docker (FalkorDB, Qdrant, API).
make test: Run unit tests (fast, no services).
make test-integration: Start Docker and run full integration tests.
make fmt / make lint: Format with Black/Isort and lint with Flake8.
make bench-eval BENCH=locomo-mini: Run snapshot-based benchmark (~2 min). See Benchmarking section below.
make deploy / make status: Deploy/check Railway. Quick health: curl :8001/health.

Coding Style & Naming

Python with type hints. Indent 4 spaces; line length 100 (Black).
Tools: Black, Isort (profile=black), Flake8; pre-commit hooks available.
Run pre-commit install and make fmt && make lint before committing.
Naming: modules/functions snake_case, classes PascalCase, constants UPPER_SNAKE_CASE.

Testing Guidelines

Framework: Pytest. Place tests in tests/ named test_*.py.
Unit tests: make test.
Integration: make test-integration (requires Docker). See docs/TESTING.md for env flags and live testing options.
Add/adjust tests for new endpoints, stores, or utils; prefer fixtures over globals.

Benchmarking

The benchmark system uses snapshot-based evaluation: ingest once, eval many times from the same snapshot. This keeps runs deterministic and fast.

Source of truth: benchmarks/EXPERIMENT_LOG.md — contains current baselines, all experiment results, and the tiered benchmark table.

automem is the canonical home for official benchmark harnesses and published benchmark numbers. Use the separate automem-evals repo for exploratory ruleset work, seeded corpora, scenario authoring, cross-agent or cross-backend comparisons, and bulky timestamped result artifacts. External eval repos should treat AutoMem as a black-box service and follow docs/EVALS_CONTRACT.md.

Tiered System

Tier	Benchmark	Command	Runtime	Cost	When to use
0	Unit tests	`make test`	30s	free	Every change
1	LoCoMo-mini (2 convos, 304 Qs)	`make bench-eval BENCH=locomo-mini`	2-3 min	free / ~$0.20 with judge	Rapid iteration
2	LoCoMo-full (10 convos, 1986 Qs)	`make bench-eval BENCH=locomo`	5-10 min	free / ~$1-3 with judge	Before merge
3	LongMemEval-mini (20 Qs)	`make bench-mini-longmemeval`	15 min	~$1	Scoring/entity changes
4	LongMemEval-full (500 Qs)	`make test-longmemeval`	1-2 hr	~$10	Milestones only

Key Commands

make bench-eval BENCH=locomo-mini CONFIG=baseline — eval from snapshot (~2 min).
make bench-compare BENCH=locomo CONFIG=<name> BASELINE=baseline — A/B compare two configs.
make bench-compare-branch BRANCH=<branch> — compare a branch against baseline.
make bench-ingest BENCH=locomo — ingest + snapshot (run once per embedding change).
make bench-health — recall health check (score distribution, entity quality, latency).

Workflow for Recall/Retrieval Changes

Run make bench-eval BENCH=locomo-mini on main to confirm the current baseline.
Create a feature branch and implement changes.
Run the same eval on the branch.
Record both results as a new row in benchmarks/EXPERIMENT_LOG.md.
Promote to make bench-eval BENCH=locomo (full) before merge.

Directory Layout

benchmarks/EXPERIMENT_LOG.md — results table and experiment metadata (committed).
benchmarks/baselines/ — baseline result JSONs (small files committed, large ones gitignored).
benchmarks/snapshots/ — Qdrant/FalkorDB snapshot data (gitignored, regenerate with make bench-ingest).
benchmarks/results/ — per-run result JSONs (gitignored).
scripts/bench/ — shell and Python scripts driving ingest, eval, compare, and health checks.
tests/benchmarks/ — legacy benchmark harnesses (LoCoMo, LongMemEval) and historical result markdown files.

Commit & Pull Requests

PR titles must use Conventional Commit format because squash merges use the PR title as the release commit title. Do not prefix titles with [codex], [claude], [copilot], [wip], or similar labels; put agent/status context in the PR body.
Use Conventional Commit types: feat, fix, docs, refactor, test, ci, build, chore, perf, revert (e.g., feat(api): add /analyze endpoint).
For public API changes, use feat(api): ... unless the change is strictly a bug fix with no new public surface. For docs-only changes, use docs: ...; for release automation, use ci(release): ... or chore(release): ....
PRs must include: clear description and scope, linked issues, test plan/output, and notes on API or config changes. Update relevant docs under docs/.
CI must pass; formatting/lint clean.

Security & Configuration

Never commit secrets. Configure via env vars: AUTOMEM_API_TOKEN, ADMIN_API_TOKEN, OPENAI_API_KEY, FALKORDB_PASSWORD, QDRANT_API_KEY.
Local dev uses Docker defaults; see docs/ENVIRONMENT_VARIABLES.md and docker-compose.yml for ports and credentials.

Agent Memory Protocol

Follow rules in .cursor/rules/automem.mdc for memory operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository Guidelines

Project Structure & Modules

Build, Test, and Development

Coding Style & Naming

Testing Guidelines

Benchmarking

Tiered System

Key Commands

Workflow for Recall/Retrieval Changes

Directory Layout

Commit & Pull Requests

Security & Configuration

Agent Memory Protocol

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Repository Guidelines

Project Structure & Modules

Build, Test, and Development

Coding Style & Naming

Testing Guidelines

Benchmarking

Tiered System

Key Commands

Workflow for Recall/Retrieval Changes

Directory Layout

Commit & Pull Requests

Security & Configuration

Agent Memory Protocol