Skip to content

Skt329/semantic-recall


🧠 semantic-recall

Give your AI a brain that remembers.
Persistent semantic memory for LLM apps — zero config, zero API keys, two methods.

npm version npm downloads MIT License Node.js Stars

📦 View on npm · ⭐ Star on GitHub · 🐛 Report a Bug · 💡 Request a Feature


Every LLM chatbot forgets everything between sessions. Users repeat themselves. Context is lost. semantic-recall fixes this — in two lines of code.

import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })

memory.remember("User is vegetarian and allergic to nuts")

const facts = await memory.recall("What should I recommend for dinner?")
// → ["User is vegetarian and allergic to nuts"]

No vector database. No API keys. No Docker containers. Just npm install and go.


Why semantic-recall?

Most memory solutions require you to set up infrastructure, manage API keys, or lock into a paid platform. semantic-recall is different:

semantic-recall Mem0 Zep LangChain Memory
npm install & go ❌ Requires API key or self-host setup ❌ Requires server (Docker)
Works offline ✅ Local embeddings ❌ Cloud API calls ❌ Server required ❌ No built-in embeddings
Persistent across sessions ✅ SQLite on disk ✅ Cloud-managed ✅ Server-managed ❌ In-memory by default
Semantic search ✅ Cosine similarity ✅ Knowledge graph ❌ Keyword/buffer only
Auto deduplication ✅ Configurable threshold
Crash recovery ✅ Persistent queue
Worker thread isolation ✅ CPU never blocks N/A (separate server)
TTL / auto-expiry "7d", "1h"
Multi-tenant ✅ userId + namespace ✅ user/session/agent ✅ Sessions
Bundle size ~67 KB Cloud SDK Cloud SDK Large framework
Free & open-source ✅ MIT, forever Freemium (paid tiers) Freemium (credit-based) ✅ MIT
Self-contained ✅ Single package ❌ Platform dependency ❌ Server + Redis + Postgres ❌ Framework dependency

TL;DR — semantic-recall is the only solution that gives you persistent, semantic, crash-safe memory with zero infrastructure and zero API keys out of the box.


Installation

npm install semantic-recall

First-run note: The initial call downloads a ~25 MB embedding model to a local cache. After that, everything runs offline with zero network calls.


Works Great With

  • OpenAI Node SDK — inject recalled facts directly into your messages[] array
  • Vercel AI SDK — wrap recall() as a tool call for streaming chat apps
  • LangChain JS — use as a persistent, semantic drop-in memory module
  • Turso — serverless edge storage adapter built-in
  • Supabase — Postgres storage adapter built-in
  • Transformers.js — powers the local offline embeddings under the hood

Quick Start

The Basics — remember() and recall()

import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })

// Store memories (fire-and-forget — returns instantly, never throws)
memory.remember("User prefers dark mode")
memory.remember("User is a senior TypeScript developer")
memory.remember("User lives in San Francisco")

// Retrieve relevant context for your LLM prompt
const context = await memory.recall("What IDE theme should I suggest?")
// → ["User prefers dark mode"]

// Inject into your system prompt
const systemPrompt = `You are a helpful assistant.
Known facts about the user:
${context.map(f => `- ${f}`).join('\n')}`

Synchronous Confirmation

const result = await memory.rememberAndWait("User is vegetarian")
console.log(result) // → { saved: true, duplicate: false }

const result2 = await memory.rememberAndWait("User is vegetarian")
console.log(result2) // → { saved: false, duplicate: true }

Namespaces — Organize by Topic

const memory = new Memory({ userId: 'user_123', namespace: 'health' })

memory.remember("User is allergic to peanuts")

// Only searches the 'health' namespace
const health = await memory.recall("allergies")

// Cross-namespace query
const work = await memory.recall("allergies", { namespace: 'work' }) // → []

TTL — Auto-Expiring Memories

// Memory expires after 7 days
memory.remember("User is in Paris for a conference", { ttl: "7d" })

// Supported formats: '500ms', '60s', '30m', '12h', '7d'
memory.remember("Session preference: compact view", { ttl: "1h" })

LLM Auto-Extraction

Automatically extract memorable facts from conversations:

const memory = new Memory({
  userId: 'user_123',
  llmProvider: 'openai',
  llmApiKey: process.env.OPENAI_API_KEY,
})

await memory.extractAndRemember([
  { role: 'user', content: "I just moved to Tokyo from London" },
  { role: 'assistant', content: "Welcome to Tokyo! How exciting..." },
  { role: 'user', content: "Yeah, I'm starting a new job as a ML engineer at Google" },
])
// Automatically extracts and stores:
// → "User lives in Tokyo"
// → "User previously lived in London"
// → "User works as a ML engineer at Google"

Supported providers: 'openai' · 'gemini' · 'claude' · or any custom LLMFunction.


How It Works

remember("user is vegetarian")
         │
         ▼
  ┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
  │   Enqueue    │────▶│  Embed Text  │────▶│  Dedup Check    │
  │ (persistent  │     │ (worker      │     │ (cosine sim     │
  │  queue)      │     │  thread)     │     │  ≥ 0.92?)       │
  └─────────────┘     └──────────────┘     └────────┬────────┘
                                                     │
                                            ┌────────┴────────┐
                                            │                 │
                                       Unique            Duplicate
                                            │                 │
                                            ▼                 ▼
                                     ┌────────────┐    ┌────────────┐
                                     │   INSERT    │    │   Skip     │
                                     │ + emit      │    │ (mark done)│
                                     │ memory:saved│    │            │
                                     └────────────┘    └────────────┘

Reliability — Built Like Infrastructure

Every call to remember() is crash-safe. Memories are first written to a persistent pending_memories queue, then processed asynchronously. If your process crashes mid-pipeline:

PENDING ──▶ PROCESSING ──▶ DONE
                │
                ▼
             FAILED ──(exponential backoff)──▶ PENDING
                │
                ▼ (after max attempts)
              DEAD ──(manual retry)──▶ PENDING
  • Stale recovery: On startup, stuck PROCESSING jobs are automatically reset to PENDING
  • Exponential backoff: Failed jobs retry with 2^n second delays (2s → 4s → 8s)
  • Dead letter queue: After max attempts, jobs move to DEAD for manual inspection
  • Never throws: remember() swallows all errors — your app never crashes because of memory storage

Observability

Real-time events for monitoring and debugging:

memory.on('memory:saved', ({ content, jobId }) => {
  console.log(`✓ Saved: "${content}" (id: ${jobId})`)
})

memory.on('memory:duplicate', ({ content }) => {
  console.log(`⊘ Duplicate skipped: "${content}"`)
})

memory.on('memory:retry', ({ content, error, attempts }) => {
  console.warn(`↻ Retry #${attempts}: "${content}" — ${error}`)
})

memory.on('memory:dead', ({ content, error }) => {
  console.error(`☠ Dead: "${content}" — ${error}`)
})

Storage Adapters

SQLite (Default) — Zero Config

Works everywhere with a filesystem. WAL mode enabled for concurrent reads.

const memory = new Memory({
  userId: 'user_123',
  dbPath: './my-memories.db', // default: './semantic-recall.db'
})

Turso — Serverless Edge

For serverless and edge deployments with Turso:

npm install @libsql/client
import { Memory } from 'semantic-recall'
import { createTursoAdapter } from 'semantic-recall/adapters/storage/turso'

const memory = new Memory({
  userId: 'user_123',
  storage: createTursoAdapter({
    url: 'libsql://your-db.turso.io',
    authToken: 'your-token',
  }),
})

Supabase — Postgres Scale

For production Postgres deployments with Supabase:

npm install @supabase/supabase-js
import { Memory } from 'semantic-recall'
import { createSupabaseAdapter } from 'semantic-recall/adapters/storage/supabase'

const memory = new Memory({
  userId: 'user_123',
  storage: createSupabaseAdapter({
    url: 'https://your-project.supabase.co',
    anonKey: 'your-anon-key',
    dimensions: 384,
  }),
})

Custom Adapter

Implement the StorageAdapter interface for any backend:

import { Memory, type StorageAdapter } from 'semantic-recall'

const myAdapter: StorageAdapter = {
  async init() { /* create tables */ },
  async insertMemory(params) { /* insert */ },
  async searchMemories(params) { /* return all rows */ },
  async deleteMemory(id) { /* delete by id */ },
  async deleteAllMemories(userId, namespace) { /* bulk delete */ },
  async listMemories(userId, namespace, limit) { /* list */ },
  async pruneExpired(userId) { /* remove expired */ },
  async enqueue(job) { /* queue job, return id */ },
  async markProcessing(jobId) { /* update status */ },
  async markDone(jobId) { /* update status */ },
  async markFailed(jobId, error) { /* update status + backoff */ },
  async getRetryable() { /* return pending/failed jobs */ },
  async getDeadJobs(userId) { /* return dead jobs */ },
  async resetStaleProcessing() { /* crash recovery */ },
  async cleanupDoneJobs(olderThanMs) { /* prune */ },
  async retryDeadJob(jobId) { /* reset dead → pending */ },
  close() { /* cleanup */ },
}

const memory = new Memory({ userId: 'user_123', storage: myAdapter })

Embedder Adapters

Local (Default) — No API Keys

Uses Transformers.js in an isolated worker thread. The main thread is never blocked.

const memory = new Memory({
  userId: 'user_123',
  embedder: 'local',
  embeddingModel: 'Xenova/all-MiniLM-L6-v2', // 384 dims, ~25 MB
})

OpenAI

const memory = new Memory({
  userId: 'user_123',
  embedder: 'openai',
  openaiApiKey: process.env.OPENAI_API_KEY,
  embeddingModel: 'text-embedding-3-small',
})

Custom Embedder

const memory = new Memory({
  userId: 'user_123',
  embedder: async (text: string): Promise<number[]> => {
    const res = await fetch('https://my-api.com/embed', {
      method: 'POST',
      body: JSON.stringify({ text }),
    })
    return res.json()
  },
})

Full Configuration

const memory = new Memory({
  // ─── Required ──────────────────────────────────
  userId: 'user_123',

  // ─── Storage ───────────────────────────────────
  storage: 'sqlite',            // 'sqlite' | StorageAdapter
  dbPath: './semantic-recall.db',

  // ─── Embedder ──────────────────────────────────
  embedder: 'local',            // 'local' | 'openai' | EmbedderFunction
  embeddingModel: 'Xenova/all-MiniLM-L6-v2',
  openaiApiKey: '...',          // Required if embedder: 'openai'

  // ─── Behavior ──────────────────────────────────
  namespace: 'default',
  dedupThreshold: 0.92,         // Cosine sim threshold for dedup (0–1)
  recallThreshold: 0.70,        // Min similarity to return (0–1)
  topK: 5,                      // Max results per recall()

  // ─── Reliability ───────────────────────────────
  maxAttempts: 3,                // Retries before marking dead
  retryIntervalMs: 30_000,      // Retry scheduler interval

  // ─── LLM Auto-Extraction ──────────────────────
  llmProvider: 'openai',        // 'openai' | 'gemini' | 'claude' | LLMFunction
  llmApiKey: '...',
  llmModel: 'gpt-4o-mini',
})

API Reference

Method Returns Description
memory.remember(text, opts?) void Store a memory. Fire-and-forget, never throws.
memory.rememberAndWait(text, opts?) Promise<RememberResult> Store and wait. Returns { saved, duplicate }.
memory.recall(query, opts?) Promise<string[]> Semantic search. Returns content strings.
memory.recallDetailed(query, opts?) Promise<MemoryResult[]> Like recall but with similarity scores + metadata.
memory.extractAndRemember(messages, opts?) Promise<void> LLM-powered fact extraction from conversations.
memory.forget(memoryId) Promise<void> Delete a specific memory.
memory.forgetAll(opts?) Promise<void> Delete all memories for user+namespace.
memory.list(opts?) Promise<MemoryResult[]> List all stored memories (no search).
memory.getDeadJobs() Promise<MemoryJob[]> Inspect failed jobs.
memory.retryDead(jobId) Promise<void> Retry a dead job.
memory.cleanup(opts?) Promise<{ deleted }> Prune old done jobs from queue.
memory.destroy() void Stop scheduler, close DB.

Events

Event Payload When
memory:saved { jobId, content, replayed?, retried? } Memory stored successfully
memory:retry { jobId, content, error, attempts } Job failed, will retry
memory:dead { jobId, content, error, attempts } Job exhausted all retries

Types

All types are exported for TypeScript consumers:

import type {
  MemoryOptions,
  RememberOptions,
  RecallOptions,
  MemoryResult,
  RememberResult,
  MemoryJob,
  StorageAdapter,
  EmbedderFunction,
  ConversationMessage,
  LLMFunction,
  MemorySavedEvent,
  MemoryRetryEvent,
  MemoryDeadEvent,
} from 'semantic-recall'

Real-World Patterns

Inject Context Into Any LLM

import OpenAI from 'openai'
import { Memory } from 'semantic-recall'

const memory = new Memory({ userId: 'user_123' })
const openai = new OpenAI()

async function chat(userMessage: string) {
  // Recall relevant memories
  const context = await memory.recall(userMessage)

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant.
Known facts about the user:
${context.map(f => `- ${f}`).join('\n')}`,
      },
      { role: 'user', content: userMessage },
    ],
  })

  const reply = response.choices[0].message.content!

  // Auto-extract facts from this exchange
  await memory.extractAndRemember([
    { role: 'user', content: userMessage },
    { role: 'assistant', content: reply },
  ])

  return reply
}

Graceful Shutdown

process.on('SIGTERM', () => {
  memory.destroy() // Stops retry scheduler, closes DB
  process.exit(0)
})

Dead Job Monitoring

// In a health check endpoint
app.get('/health/memory', async (req, res) => {
  const dead = await memory.getDeadJobs()
  res.json({
    status: dead.length === 0 ? 'healthy' : 'degraded',
    deadJobs: dead.length,
  })
})

Comparison Deep Dive

vs Mem0

Mem0 is a managed memory platform (cloud-hosted or self-hosted). It's a great product if you want a managed service — but it requires API keys for the cloud version and Docker + Redis for self-hosting. semantic-recall runs entirely locally with npm install and zero infrastructure.

vs Zep

Zep is a temporal knowledge graph server. It's architecturally different — it tracks how facts change over time using a graph model. Powerful, but requires running a separate server with PostgreSQL and Redis. semantic-recall is an embedded library that lives inside your process.

vs LangChain Memory

LangChain's memory modules store raw conversation history (not facts). They are in-memory by default (lost on restart), don't do semantic search, and are part of a large framework. semantic-recall is a focused, standalone package that persists extracted facts with semantic retrieval.


Contributing

We welcome contributions! See our Contributing Guide for:

  • Development setup and project structure
  • Coding standards and commit conventions
  • PR process and templates
  • High-impact contribution ideas (new adapters, batch ops, streaming, metadata)

Quick Links


Requirements

  • Node.js ≥ 18.0.0
  • OS: Windows, macOS, Linux

License

MIT — free forever.


Built with care for the AI developer community.
If this saved you time, consider giving it a ⭐ on GitHub.

🤖 AI/LLM tool or crawler? See llms.txt for a structured summary of this package.

Packages

 
 
 

Contributors