🧠 semantic-recall

Give your AI a brain that remembers.
Persistent semantic memory for LLM apps — zero config, zero API keys, two methods.

📦 View on npm · ⭐ Star on GitHub · 🐛 Report a Bug · 💡 Request a Feature

Every LLM chatbot forgets everything between sessions. Users repeat themselves. Context is lost. semantic-recall fixes this — in two lines of code.

import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })

memory.remember("User is vegetarian and allergic to nuts")

const facts = await memory.recall("What should I recommend for dinner?")
// → ["User is vegetarian and allergic to nuts"]

No vector database. No API keys. No Docker containers. Just npm install and go.

Why semantic-recall?

Most memory solutions require you to set up infrastructure, manage API keys, or lock into a paid platform. semantic-recall is different:

	semantic-recall	Mem0	Zep	LangChain Memory
npm install & go	✅	❌ Requires API key or self-host setup	❌ Requires server (Docker)	✅
Works offline	✅ Local embeddings	❌ Cloud API calls	❌ Server required	❌ No built-in embeddings
Persistent across sessions	✅ SQLite on disk	✅ Cloud-managed	✅ Server-managed	❌ In-memory by default
Semantic search	✅ Cosine similarity	✅	✅ Knowledge graph	❌ Keyword/buffer only
Auto deduplication	✅ Configurable threshold	✅	❌	❌
Crash recovery	✅ Persistent queue	❌	❌	❌
Worker thread isolation	✅ CPU never blocks	❌	N/A (separate server)	❌
TTL / auto-expiry	✅ `"7d"`, `"1h"`	❌	❌	❌
Multi-tenant	✅ userId + namespace	✅ user/session/agent	✅ Sessions	❌
Bundle size	~67 KB	Cloud SDK	Cloud SDK	Large framework
Free & open-source	✅ MIT, forever	Freemium (paid tiers)	Freemium (credit-based)	✅ MIT
Self-contained	✅ Single package	❌ Platform dependency	❌ Server + Redis + Postgres	❌ Framework dependency

TL;DR — semantic-recall is the only solution that gives you persistent, semantic, crash-safe memory with zero infrastructure and zero API keys out of the box.

Installation

npm install semantic-recall

First-run note: The initial call downloads a ~25 MB embedding model to a local cache. After that, everything runs offline with zero network calls.

Works Great With

OpenAI Node SDK — inject recalled facts directly into your messages[] array
Vercel AI SDK — wrap recall() as a tool call for streaming chat apps
LangChain JS — use as a persistent, semantic drop-in memory module
Turso — serverless edge storage adapter built-in
Supabase — Postgres storage adapter built-in
Transformers.js — powers the local offline embeddings under the hood

Quick Start

The Basics — `remember()` and `recall()`

import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })

// Store memories (fire-and-forget — returns instantly, never throws)
memory.remember("User prefers dark mode")
memory.remember("User is a senior TypeScript developer")
memory.remember("User lives in San Francisco")

// Retrieve relevant context for your LLM prompt
const context = await memory.recall("What IDE theme should I suggest?")
// → ["User prefers dark mode"]

// Inject into your system prompt
const systemPrompt = `You are a helpful assistant.
Known facts about the user:
${context.map(f => `- ${f}`).join('\n')}`

Synchronous Confirmation

const result = await memory.rememberAndWait("User is vegetarian")
console.log(result) // → { saved: true, duplicate: false }

const result2 = await memory.rememberAndWait("User is vegetarian")
console.log(result2) // → { saved: false, duplicate: true }

Namespaces — Organize by Topic

const memory = new Memory({ userId: 'user_123', namespace: 'health' })

memory.remember("User is allergic to peanuts")

// Only searches the 'health' namespace
const health = await memory.recall("allergies")

// Cross-namespace query
const work = await memory.recall("allergies", { namespace: 'work' }) // → []

TTL — Auto-Expiring Memories

// Memory expires after 7 days
memory.remember("User is in Paris for a conference", { ttl: "7d" })

// Supported formats: '500ms', '60s', '30m', '12h', '7d'
memory.remember("Session preference: compact view", { ttl: "1h" })

LLM Auto-Extraction

Automatically extract memorable facts from conversations:

const memory = new Memory({
  userId: 'user_123',
  llmProvider: 'openai',
  llmApiKey: process.env.OPENAI_API_KEY,
})

await memory.extractAndRemember([
  { role: 'user', content: "I just moved to Tokyo from London" },
  { role: 'assistant', content: "Welcome to Tokyo! How exciting..." },
  { role: 'user', content: "Yeah, I'm starting a new job as a ML engineer at Google" },
])
// Automatically extracts and stores:
// → "User lives in Tokyo"
// → "User previously lived in London"
// → "User works as a ML engineer at Google"

Supported providers: 'openai' · 'gemini' · 'claude' · or any custom LLMFunction.

How It Works

remember("user is vegetarian")
         │
         ▼
  ┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
  │   Enqueue    │────▶│  Embed Text  │────▶│  Dedup Check    │
  │ (persistent  │     │ (worker      │     │ (cosine sim     │
  │  queue)      │     │  thread)     │     │  ≥ 0.92?)       │
  └─────────────┘     └──────────────┘     └────────┬────────┘
                                                     │
                                            ┌────────┴────────┐
                                            │                 │
                                       Unique            Duplicate
                                            │                 │
                                            ▼                 ▼
                                     ┌────────────┐    ┌────────────┐
                                     │   INSERT    │    │   Skip     │
                                     │ + emit      │    │ (mark done)│
                                     │ memory:saved│    │            │
                                     └────────────┘    └────────────┘

Reliability — Built Like Infrastructure

Every call to remember() is crash-safe. Memories are first written to a persistent pending_memories queue, then processed asynchronously. If your process crashes mid-pipeline:

PENDING ──▶ PROCESSING ──▶ DONE
                │
                ▼
             FAILED ──(exponential backoff)──▶ PENDING
                │
                ▼ (after max attempts)
              DEAD ──(manual retry)──▶ PENDING

Stale recovery: On startup, stuck PROCESSING jobs are automatically reset to PENDING
Exponential backoff: Failed jobs retry with 2^n second delays (2s → 4s → 8s)
Dead letter queue: After max attempts, jobs move to DEAD for manual inspection
Never throws: remember() swallows all errors — your app never crashes because of memory storage

Observability

Real-time events for monitoring and debugging:

memory.on('memory:saved', ({ content, jobId }) => {
  console.log(`✓ Saved: "${content}" (id: ${jobId})`)
})

memory.on('memory:duplicate', ({ content }) => {
  console.log(`⊘ Duplicate skipped: "${content}"`)
})

memory.on('memory:retry', ({ content, error, attempts }) => {
  console.warn(`↻ Retry #${attempts}: "${content}" — ${error}`)
})

memory.on('memory:dead', ({ content, error }) => {
  console.error(`☠ Dead: "${content}" — ${error}`)
})

Storage Adapters

SQLite (Default) — Zero Config

Works everywhere with a filesystem. WAL mode enabled for concurrent reads.

const memory = new Memory({
  userId: 'user_123',
  dbPath: './my-memories.db', // default: './semantic-recall.db'
})

Turso — Serverless Edge

For serverless and edge deployments with Turso:

npm install @libsql/client

import { Memory } from 'semantic-recall'
import { createTursoAdapter } from 'semantic-recall/adapters/storage/turso'

const memory = new Memory({
  userId: 'user_123',
  storage: createTursoAdapter({
    url: 'libsql://your-db.turso.io',
    authToken: 'your-token',
  }),
})

Supabase — Postgres Scale

For production Postgres deployments with Supabase:

npm install @supabase/supabase-js

import { Memory } from 'semantic-recall'
import { createSupabaseAdapter } from 'semantic-recall/adapters/storage/supabase'

const memory = new Memory({
  userId: 'user_123',
  storage: createSupabaseAdapter({
    url: 'https://your-project.supabase.co',
    anonKey: 'your-anon-key',
    dimensions: 384,
  }),
})

Custom Adapter

Implement the StorageAdapter interface for any backend:

import { Memory, type StorageAdapter } from 'semantic-recall'

const myAdapter: StorageAdapter = {
  async init() { /* create tables */ },
  async insertMemory(params) { /* insert */ },
  async searchMemories(params) { /* return all rows */ },
  async deleteMemory(id) { /* delete by id */ },
  async deleteAllMemories(userId, namespace) { /* bulk delete */ },
  async listMemories(userId, namespace, limit) { /* list */ },
  async pruneExpired(userId) { /* remove expired */ },
  async enqueue(job) { /* queue job, return id */ },
  async markProcessing(jobId) { /* update status */ },
  async markDone(jobId) { /* update status */ },
  async markFailed(jobId, error) { /* update status + backoff */ },
  async getRetryable() { /* return pending/failed jobs */ },
  async getDeadJobs(userId) { /* return dead jobs */ },
  async resetStaleProcessing() { /* crash recovery */ },
  async cleanupDoneJobs(olderThanMs) { /* prune */ },
  async retryDeadJob(jobId) { /* reset dead → pending */ },
  close() { /* cleanup */ },
}

const memory = new Memory({ userId: 'user_123', storage: myAdapter })

Embedder Adapters

Local (Default) — No API Keys

Uses Transformers.js in an isolated worker thread. The main thread is never blocked.

const memory = new Memory({
  userId: 'user_123',
  embedder: 'local',
  embeddingModel: 'Xenova/all-MiniLM-L6-v2', // 384 dims, ~25 MB
})

OpenAI

const memory = new Memory({
  userId: 'user_123',
  embedder: 'openai',
  openaiApiKey: process.env.OPENAI_API_KEY,
  embeddingModel: 'text-embedding-3-small',
})

Custom Embedder

const memory = new Memory({
  userId: 'user_123',
  embedder: async (text: string): Promise<number[]> => {
    const res = await fetch('https://my-api.com/embed', {
      method: 'POST',
      body: JSON.stringify({ text }),
    })
    return res.json()
  },
})

Full Configuration

const memory = new Memory({
  // ─── Required ──────────────────────────────────
  userId: 'user_123',

  // ─── Storage ───────────────────────────────────
  storage: 'sqlite',            // 'sqlite' | StorageAdapter
  dbPath: './semantic-recall.db',

  // ─── Embedder ──────────────────────────────────
  embedder: 'local',            // 'local' | 'openai' | EmbedderFunction
  embeddingModel: 'Xenova/all-MiniLM-L6-v2',
  openaiApiKey: '...',          // Required if embedder: 'openai'

  // ─── Behavior ──────────────────────────────────
  namespace: 'default',
  dedupThreshold: 0.92,         // Cosine sim threshold for dedup (0–1)
  recallThreshold: 0.70,        // Min similarity to return (0–1)
  topK: 5,                      // Max results per recall()

  // ─── Reliability ───────────────────────────────
  maxAttempts: 3,                // Retries before marking dead
  retryIntervalMs: 30_000,      // Retry scheduler interval

  // ─── LLM Auto-Extraction ──────────────────────
  llmProvider: 'openai',        // 'openai' | 'gemini' | 'claude' | LLMFunction
  llmApiKey: '...',
  llmModel: 'gpt-4o-mini',
})

API Reference

Method	Returns	Description
`memory.remember(text, opts?)`	`void`	Store a memory. Fire-and-forget, never throws.
`memory.rememberAndWait(text, opts?)`	`Promise<RememberResult>`	Store and wait. Returns `{ saved, duplicate }`.
`memory.recall(query, opts?)`	`Promise<string[]>`	Semantic search. Returns content strings.
`memory.recallDetailed(query, opts?)`	`Promise<MemoryResult[]>`	Like recall but with similarity scores + metadata.
`memory.extractAndRemember(messages, opts?)`	`Promise<void>`	LLM-powered fact extraction from conversations.
`memory.forget(memoryId)`	`Promise<void>`	Delete a specific memory.
`memory.forgetAll(opts?)`	`Promise<void>`	Delete all memories for user+namespace.
`memory.list(opts?)`	`Promise<MemoryResult[]>`	List all stored memories (no search).
`memory.getDeadJobs()`	`Promise<MemoryJob[]>`	Inspect failed jobs.
`memory.retryDead(jobId)`	`Promise<void>`	Retry a dead job.
`memory.cleanup(opts?)`	`Promise<{ deleted }>`	Prune old done jobs from queue.
`memory.destroy()`	`void`	Stop scheduler, close DB.

Events

Event	Payload	When
`memory:saved`	`{ jobId, content, replayed?, retried? }`	Memory stored successfully
`memory:retry`	`{ jobId, content, error, attempts }`	Job failed, will retry
`memory:dead`	`{ jobId, content, error, attempts }`	Job exhausted all retries

Types

All types are exported for TypeScript consumers:

import type {
  MemoryOptions,
  RememberOptions,
  RecallOptions,
  MemoryResult,
  RememberResult,
  MemoryJob,
  StorageAdapter,
  EmbedderFunction,
  ConversationMessage,
  LLMFunction,
  MemorySavedEvent,
  MemoryRetryEvent,
  MemoryDeadEvent,
} from 'semantic-recall'

Real-World Patterns

Inject Context Into Any LLM

import OpenAI from 'openai'
import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })
const openai = new OpenAI()

async function chat(userMessage: string) {
  // Recall relevant memories
  const context = await memory.recall(userMessage)

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant.
Known facts about the user:
${context.map(f => `- ${f}`).join('\n')}`,
      },
      { role: 'user', content: userMessage },
    ],
  })

  const reply = response.choices[0].message.content!

  // Auto-extract facts from this exchange
  await memory.extractAndRemember([
    { role: 'user', content: userMessage },
    { role: 'assistant', content: reply },
  ])

  return reply
}

Graceful Shutdown

process.on('SIGTERM', () => {
  memory.destroy() // Stops retry scheduler, closes DB
  process.exit(0)
})

Dead Job Monitoring

// In a health check endpoint
app.get('/health/memory', async (req, res) => {
  const dead = await memory.getDeadJobs()
  res.json({
    status: dead.length === 0 ? 'healthy' : 'degraded',
    deadJobs: dead.length,
  })
})

Comparison Deep Dive

vs Mem0

Mem0 is a managed memory platform (cloud-hosted or self-hosted). It's a great product if you want a managed service — but it requires API keys for the cloud version and Docker + Redis for self-hosting. semantic-recall runs entirely locally with npm install and zero infrastructure.

vs Zep

Zep is a temporal knowledge graph server. It's architecturally different — it tracks how facts change over time using a graph model. Powerful, but requires running a separate server with PostgreSQL and Redis. semantic-recall is an embedded library that lives inside your process.

vs LangChain Memory

LangChain's memory modules store raw conversation history (not facts). They are in-memory by default (lost on restart), don't do semantic search, and are part of a large framework. semantic-recall is a focused, standalone package that persists extracted facts with semantic retrieval.

Contributing

We welcome contributions! See our Contributing Guide for:

Development setup and project structure
Coding standards and commit conventions
PR process and templates
High-impact contribution ideas (new adapters, batch ops, streaming, metadata)

Quick Links

Requirements

Node.js ≥ 18.0.0
OS: Windows, macOS, Linux

License

MIT — free forever.

_{Built with care for the AI developer community.}
_{If this saved you time, consider giving it a ⭐ on GitHub.}

_{🤖 AI/LLM tool or crawler? See llms.txt for a structured summary of this package.}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.cursor/rules		.cursor/rules
.github		.github
src		src
tests		tests
.gitignore		.gitignore
.npmignore		.npmignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
llms-full.txt		llms-full.txt
llms.txt		llms.txt
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

🧠 semantic-recall

Why semantic-recall?

Installation

Works Great With

Quick Start

The Basics — remember() and recall()

Synchronous Confirmation

Namespaces — Organize by Topic

TTL — Auto-Expiring Memories

LLM Auto-Extraction

How It Works

Reliability — Built Like Infrastructure

Observability

Storage Adapters

SQLite (Default) — Zero Config

Turso — Serverless Edge

Supabase — Postgres Scale

Custom Adapter

Embedder Adapters

Local (Default) — No API Keys

OpenAI

Custom Embedder

Full Configuration

API Reference

Events

Types

Real-World Patterns

Inject Context Into Any LLM

Graceful Shutdown

Dead Job Monitoring

Comparison Deep Dive

vs Mem0

vs Zep

vs LangChain Memory

Contributing

Quick Links

Requirements

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The Basics — `remember()` and `recall()`

Packages