Anchor

Provenance-first RAG that refuses to hallucinate.

A retrieval layer that returns a grounded answer when similarity is high, and explicitly refuses when it isn't — no fabrication, no hedging.

Live demo: anchor-iota-ten.vercel.app Playground: anchor-iota-ten.vercel.app/playground

The problem

RAG tutorials show the happy path. Production lives in the unhappy path.

A cosine similarity of 0.12 between the query and the closest chunk in your corpus is not a foundation for a confident answer — but most RAG systems feed it to the LLM anyway and get a plausible-sounding fabrication. Anchor treats that signal for what it is: too weak to use.

Project layout

anchor/
├── src/app/api/               # API routes (query, chat, health, admin/seed)
├── src/app/playground/        # /playground — interactive query UI
├── src/lib/rag/               # retriever, embed-writer, sources, demo-seeder, seed-runner
├── prisma/                    # schema + migrations (incl. CREATE EXTENSION vector) + seed.ts
├── scripts/                   # embed-backfill, calibrate-floor, e2e
├── tests/                     # retriever, embed-writer, sources, query-route tests
├── docs/architecture.md       # full system architecture + sequence diagrams
├── docs/CLAIM_AUDIT.md        # every public claim → file:line that backs it
├── docker-compose.yml         # Postgres + pgvector + app
├── Dockerfile                 # multi-stage production image
└── SPEC.md                    # feature inventory + code locations

Architecture overview

Query → Embed → pgvector cosine similarity → {score ≥ floor?} → Yes: Return chunks + sources / No: Return refused

Cosine floor. Configurable threshold (default 0.30). Below it → empty result, explicit refusal.
Adaptive K. Precision queries get K=6, recall queries get K=10.
Provenance. Every chunk carries its sourceId. The API response includes a structured sources[] array.

Live demo

# Refused state (off-topic for the seeded corpus)
curl -X POST https://anchor-iota-ten.vercel.app/api/query \
  -H "Content-Type: application/json" \
  -d '{"q":"xkcd 18472 nonsense gibberish"}'
# → {"chunks":[],"refused":true,"sources":[]}

# Grounded state (matches the seeded corpus — Ahmedabad real-estate)
curl -X POST https://anchor-iota-ten.vercel.app/api/query \
  -H "Content-Type: application/json" \
  -d '{"q":"Which Goyal & Co. projects in Shela are ready to move in?"}'
# → {"chunks":[...],"refused":false,"sources":[{"sourceId":"...","sourceType":"project","similarity":0.7,"chunkCount":2}, ...]}

The seeded corpus is the Ahmedabad (Shela / South Bopal / Bopal) real-estate dataset — 16 projects, 5 builders, 4 localities, 4 infra items, 31 POIs. On-topic queries about those entities retrieve; anything else is refused.

Stack

Layer	Choice
Vector DB	Postgres + pgvector
ORM	Prisma 7
API	Next.js 16 (App Router)
Embeddings	OpenAI `text-embedding-3-small`
Deploy	Vercel
License	Apache 2.0

~970 LOC. No framework, no managed service.

Known limitations

No LLM generation. Retrieval-only. Wire it to your model's system prompt yourself.
Small demo corpus. 16 projects — not 100k+ documents.
Single-stage retrieval. No re-ranking, no hybrid BM25. The afterRetrieve(chunks) hook is exposed.

Quickstart (clean machine, <5 min)

The only thing you bring is an OPENAI_API_KEY. Docker provides Postgres + pgvector — no hosted database required.

# 1. Clone
git clone https://github.com/ykstorm/anchor.git && cd anchor

# 2. Configure — open .env and paste your OPENAI_API_KEY.
#    DATABASE_URL is already set for the docker-compose Postgres.
cp .env.example .env

# 3. Start Postgres + pgvector (creates the `vector` extension on first boot)
docker-compose up -d

# 4. Install deps
npm install

# 5. Provision the schema (applies prisma/migrations — tables + vector column)
npx prisma migrate deploy

# 6. Seed the corpus (60 rows) and embed it into pgvector
npm run seed

# 7. Run
npm run dev

Open http://localhost:3000/playground and try:

Which Goyal & Co. projects in Shela are ready to move in? → retrieved (chunks + sources[])
xkcd 18472 nonsense gibberish → refused (refused: true, empty chunks, empty sources)

npm run seed needs OPENAI_API_KEY to embed. Without a key it still seeds the structured rows and tells you to re-run once the key is set.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
docs		docs
prisma		prisma
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOY.md		DEPLOY.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SPEC.md		SPEC.md
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
prisma.config.ts		prisma.config.ts
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.tsbuildinfo		tsconfig.tsbuildinfo
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anchor

The problem

Project layout

Architecture overview

Live demo

Stack

Known limitations

Quickstart (clean machine, <5 min)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anchor

The problem

Project layout

Architecture overview

Live demo

Stack

Known limitations

Quickstart (clean machine, <5 min)

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages