The first deterministic context database for AI agents
Fix your RAG in 5 minutes - same query, same context, every time.
AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.
Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.
Current RAG systems are fundamentally broken:
- ❌ Same query → different results each time (non-deterministic)
- ❌ Token budgets wasted on duplicates (60-70% utilization)
- ❌ No citations or verifiability
- ❌ Hallucinations from inconsistent context
- ❌ Slow (200-300ms just for OpenAI embedding calls)
- ❌ Expensive (API costs scale with usage)
- ✅ 100% Deterministic: Same query → same context, every time
- ✅ 6x Faster: 40-60ms compilation (vs 240-360ms with OpenAI)
- ✅ Zero Cost: Pure Rust embeddings, no API required
- ✅ Works Offline: No internet needed after initial setup
- ✅ Citation-Backed: Every span has exact line number citations
- ✅ Token Efficient: 95%+ budget utilization
- ✅ Drop-in Replacement: Works with any LLM
# Run benchmarks on your hardware
./target/release/avocado benchmark
# Results (M1 Mac example):
# Single embedding: 1.2ms (vs ~250ms OpenAI)
# Batch of 100: 8.7ms (vs ~250ms OpenAI)
# Full compilation: 43ms (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)See EMBEDDING_PERFORMANCE.md for detailed benchmarks.
cargo install avocado-cliThat's it! Now you can use avocado directly:
avocado --version
avocado init
avocado ingest ./docs --recursive
avocado compile "your query"Run the server with Docker:
# Run with Docker
docker run -d \
-p 8765:8765 \
-v avocado-data:/data \
--name avocadodb \
avocadodb/avocadodb:latest
# Or use Docker Compose
docker-compose up -d
# Test the server
curl http://localhost:8765/healthSee Docker Guide for complete documentation.
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone and build
git clone https://github.com/avocadodb/avocadodb
cd avocadodb
cargo build --release
# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
# - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
# - ONNX-based, fast and efficient
# - Model downloaded automatically on first use (~90MB)
# - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
# * "nomic" or "nomicv15" → 768 dimensions (good balance)
# * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
# - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
# - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai# Initialize database
./target/release/avocado init
# Get model recommendation (optional)
./target/release/avocado recommend --corpus-size 5000 --use-case production
# Recommends optimal embedding model for your use case
# Ingest documents
./target/release/avocado ingest ./docs --recursive
# Output: Ingested 42 files → 387 spans
# Compile context (uses daemon at http://localhost:8765 by default)
./target/release/avocado compile "How does authentication work?" --budget 8000
# Force local mode (uses .avocado/db.sqlite in current project)
./target/release/avocado compile "How does authentication work?" --local --budget 8000
# Run performance benchmarks
./target/release/avocado benchmark
# Shows real performance on your hardware# Start the daemon with remote GPU embeddings (Modal)
avocado serve --gpu --embed-url https://<your-modal-endpoint>/embed
# or CPU/local (default)
avocado serveExample Output:
Compiling context for: "How does authentication work?"
Token budget: 8000
[1] docs/authentication.md
Lines 1-23
# Authentication System
Our authentication uses JWT tokens with secure refresh mechanisms...
---
[2] src/middleware/auth.ts
Lines 45-78
export function authenticateRequest(req: Request) {
const token = req.headers.authorization?.split(' ')[1];
if (!token) throw new UnauthorizedError();
...
}
---
Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)
cd sdks/python
pip install -e .from avocado import AvocadoDB
db = AvocadoDB()
db.ingest("./docs", recursive=True)
result = db.compile("my query", budget=8000)
print(result.text) # Deterministic every timecd sdks/typescript
npm install
npm run buildimport { AvocadoDB } from 'avocadodb';
const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);
const result = await db.compile('my query', { budget: 8000 });
console.log(result.text); // Deterministic every time# Start server (binds to 127.0.0.1 by default)
./target/release/avocado-server
# Use the API
curl -X POST http://localhost:8765/compile \
-H "Content-Type: application/json" \
-d '{"query": "authentication", "token_budget": 8000, "project": "'"$PWD"'"}'AvocadoDB is production-ready with full Docker and Kubernetes support.
# Quick start with Docker
docker run -d -p 8765:8765 -v avocado-data:/data avocadodb/avocadodb:latest
# Or use Docker Compose
docker-compose up -dFeatures:
- Multi-stage build for minimal image size (~80-100MB)
- Multi-architecture support (linux/amd64, linux/arm64)
- Non-root user for security
- Health checks built-in
- Configurable via environment variables
See Docker Guide for complete documentation.
# Deploy to Kubernetes
kubectl apply -k k8s/
# Verify deployment
kubectl get pods -l app=avocadodbIncludes:
- Production-ready Deployment manifests
- Horizontal scaling support
- Persistent storage configuration
- Ingress with TLS/HTTPS
- ConfigMaps and Secrets management
- Resource limits and health checks
See Kubernetes Guide for complete documentation.
| Variable | Default | Description |
|---|---|---|
PORT |
8765 |
HTTP server port |
BIND_ADDR |
127.0.0.1 |
Bind address (set 0.0.0.0 to expose publicly) |
RUST_LOG |
info |
Log level |
AVOCADODB_EMBEDDING_MODEL |
minilm |
Embedding model (minilm, nomic, bgelarge) |
AVOCADODB_EMBEDDING_PROVIDER |
local |
Provider (local or openai) |
OPENAI_API_KEY |
- | OpenAI API key (if using OpenAI) |
AVOCADODB_ROOT |
unset | Optional project root. When set, all project paths must be under this directory. Requests outside are rejected. |
API_TOKEN |
unset | If set, requires header X-Avocado-Token to be present and equal for all routes (except /health, /api-docs/*). |
MAX_BODY_BYTES |
2097152 (2MB) |
Request body size limit to protect against large payloads. |
Security note:
- Do not expose the server publicly without protection. If you must, set
BIND_ADDR=0.0.0.0and front it with auth. - For local safety, clients always send an explicit
project(their current working directory), and the server normalizes paths and can restrict toAVOCADODB_ROOT.
Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
→ MMR Diversification → Token Packing → Deterministic Sort → WorkingSet
- Span-Based Indexing: Documents are split into spans (20-50 lines) with precise line numbers
- Hybrid Retrieval: Combines semantic (vector) and lexical (keyword) search
- Deterministic Ordering: Results sorted by
(artifact_id, start_line)for reproducibility - Greedy Token Packing: Maximizes token budget utilization without duplicates
NEW in v2.1: Enhanced determinism, explainability, and quality tracking features based on production feedback.
Every compilation now includes a version manifest for full reproducibility:
// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap();
println!("Avocado version: {}", manifest.avocado_version);
println!("Embedding model: {}", manifest.embedding_model);
println!("Context hash: {}", manifest.context_hash);The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.
Understand exactly how context was selected with explain mode:
# CLI with explain
avocado compile "authentication" --explain
# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order# Python SDK
result = db.compile("auth", budget=8000, explain=True)
if result.explain:
print(f"Semantic candidates: {len(result.explain.semantic_candidates)}")
print(f"Final spans: {len(result.explain.final_order)}")Compare retrieval results across corpus versions for auditing:
use avocado_core::{diff_working_sets, summarize_diff};
let diff = diff_working_sets(&before, &after);
println!("{}", summarize_diff(&diff));
// Output: "3 added, 1 removed, 2 reranked"Only re-embed changed files - unchanged content is automatically skipped:
# First ingest
avocado ingest ./docs --recursive
# Ingested 42 files → 387 spans
# Re-ingest after editing 3 files
avocado ingest ./docs --recursive
# Skipped 39 unchanged, Updated 3 files → 28 spansContent-hash comparison ensures minimal re-embedding while keeping the index fresh.
Built-in support for golden set testing and quality metrics:
use avocado_core::{GoldenQuery, evaluate};
let queries = vec![
GoldenQuery {
query: "authentication".to_string(),
expected_paths: vec!["docs/auth.md".to_string()],
k: 10,
},
];
let summary = evaluate(&queries, &db, &index, &config).await?;
println!("Recall@10: {:.2}%", summary.mean_recall * 100.0);
println!("MRR: {:.3}", summary.mean_mrr);NEW in v2.0: Multi-turn conversation tracking with context compilation
AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.
from avocado import AvocadoDB
db = AvocadoDB(mode="http")
# Create a session
session = db.create_session(user_id="alice", title="Project Q&A")
# Multi-turn conversation
result = session.compile("What is AvocadoDB?", budget=8000)
session.add_message("assistant", "AvocadoDB is a deterministic context database...")
result2 = session.compile("How does the compiler work?")
session.add_message("assistant", "The compiler uses hybrid search...")
# Get conversation history
history = session.get_history()
# Replay for debugging
replay = session.replay()- Multi-turn conversations: Track user queries and agent responses
- Context compilation: Automatically compile context for each query
- Conversation history: Retrieve formatted history with token limiting
- Session replay: Debug agent behavior by replaying entire sessions
- Persistence: Sessions stored in SQLite with full ACID guarantees
- ✅ Python SDK: Full session support with
Sessionclass - ✅ TypeScript SDK: Complete session management API
- ✅ CLI: Session commands for interactive use
- ✅ HTTP API: RESTful endpoints for all session operations
See SESSION_MANAGEMENT.md for complete documentation.
When RAG systems return different context for the same query:
- LLMs produce inconsistent answers
- Users can't verify results
- Debugging is impossible
- Trust is broken
AvocadoDB fixes this with deterministic compilation - same query, same context, every time.
# Run the same query multiple times
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Same hash every single time! ✅Phase 1 achieves production-ready performance:
| Metric | Target | Actual | Status |
|---|---|---|---|
| Compilation time (8K tokens) | < 500ms | ~50ms avg | ✅ 10x faster |
| Token budget utilization | > 95% | 90-95% | ✅ Excellent |
| Determinism | 100% | 100% | ✅ Perfect |
| Duplicate spans | 0 | 0 | ✅ Perfect |
Breakdown for 8K token budget compilation (with Pure Rust embeddings):
Embed query: 1-5ms (2-5% of total) - Pure Rust (fastembed), local
Semantic search: <1ms (Vector similarity, HNSW)
Lexical search: <1ms (SQL LIKE query)
Hybrid fusion: <1ms (RRF score combination)
MMR diversification: 5-10ms (Diversity selection)
Token packing: <1ms (Greedy budget allocation)
Deterministic sort: <1ms (Stable sort)
Build context: <1ms (Text concatenation)
Count tokens: 30-40ms (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL: 40-60ms (6x faster than OpenAI!)
Performance Comparison:
| Metric | Pure Rust (fastembed) | OpenAI API |
|---|---|---|
| Query Embedding | 1-5ms | 200-300ms |
| Total Compilation | 40-60ms | 240-360ms |
| Throughput | 200-1000 texts/sec | 3-5 batches/sec |
| Cost | Free | ~$0.0001/1K tokens |
| Rate Limits | None | Varies by tier |
| Offline | ✅ Yes | ❌ No |
| Quality | Good (384 dims) | Excellent (1536 dims) |
Pure Rust embeddings are 6x faster and completely free! Optimization: All algorithms run in <15ms total (highly optimized)
See docs/performance.md for detailed analysis and scaling characteristics.
Initialize a new AvocadoDB database:
avocado init [--path <db-path>]Creates .avocado/ directory with SQLite database and vector index.
Ingest documents into the database:
avocado ingest <path> [--recursive]Examples:
# Ingest single file
avocado ingest README.md
# Ingest entire directory recursively
avocado ingest docs/ --recursive
# Ingest specific file types
avocado ingest src/ --recursive --include "*.rs,*.md,*.toml"The ingestion process:
- Reads document content
- Extracts spans (20-50 lines with smart boundaries)
- Generates embeddings for each span (local fastembed by default)
- Stores in SQLite database
Compile a deterministic context for a query:
avocado compile <query> [OPTIONS]Options:
--budget <tokens>: Token budget (default: 8000)--json: Output as JSON instead of human-readable format--explain: Show explain plan with candidates at each pipeline stage--mmr-lambda <0.0-1.0>: MMR diversity parameter (default: 0.5)- Higher values (0.7-1.0) = more relevant but potentially redundant
- Lower values (0.0-0.3) = more diverse but potentially less relevant
--semantic-weight <float>: Semantic search weight (default: 0.7)--lexical-weight <float>: Lexical search weight (default: 0.3)
Examples:
# Basic compilation
avocado compile "How does authentication work?"
# Large context window
avocado compile "error handling patterns" --budget 16000
# Prioritize diversity over relevance
avocado compile "testing strategies" --mmr-lambda 0.3
# Tune search weights (more keyword matching)
avocado compile "API endpoints" --semantic-weight 0.5 --lexical-weight 0.5
# JSON output for programmatic use
avocado compile "authentication" --budget 8000 --jsonJSON Output Format:
{
"text": "[1] docs/auth.md\nLines 1-23\n\n# Authentication...",
"spans": [
{
"id": "uuid",
"artifact_id": "uuid",
"start_line": 1,
"end_line": 23,
"text": "# Authentication...",
"embedding": [0.002, 0.013, ...],
"embedding_model": "text-embedding-ada-002",
"token_count": 127,
"metadata": null
}
],
"citations": [
{
"span_id": "uuid",
"artifact_id": "uuid",
"artifact_path": "docs/auth.md",
"start_line": 1,
"end_line": 23,
"score": 0.0
}
],
"tokens_used": 2232,
"query": "authentication",
"compilation_time_ms": 243
}Show database statistics:
avocado statsExample output:
Database Statistics:
Artifacts: 42
Spans: 387
Total Tokens: 125,431
Average Tokens/Span: 324
Clear all data from the database:
avocado clearWarning: This permanently deletes all ingested documents and embeddings!
Use AvocadoDB as a library in your Rust projects:
[dependencies]
avocado-core = "2.1"
tokio = { version = "1.35", features = ["full"] }use avocado_core::{Database, VectorIndex, compiler, types::CompilerConfig};
#[tokio::main]
async fn main() -> avocado_core::types::Result<()> {
// Open database
let db = Database::new(".avocado/db.sqlite")?;
// Load vector index from database
let index = VectorIndex::from_database(&db)?;
// Configure compilation
let config = CompilerConfig {
token_budget: 8000,
semantic_weight: 0.7,
lexical_weight: 0.3,
mmr_lambda: 0.5,
enable_mmr: true,
};
// Compile context
let working_set = compiler::compile(
"How does authentication work?",
config,
&db,
&index,
Some("your-openai-api-key")
).await?;
println!("Compiled {} spans using {} tokens",
working_set.spans.len(),
working_set.tokens_used
);
println!("Deterministic hash: {}", working_set.deterministic_hash());
// Use working_set.text in your LLM prompt
println!("Context:\n{}", working_set.text);
Ok(())
}avocadodb/
├── avocado-core/ # Core engine (Rust)
├── avocado-cli/ # Command-line tool
├── avocado-server/ # HTTP server
├── python/ # Python SDK
├── migrations/ # Database schema
├── tests/ # Integration tests
└── docs/ # Documentation
# Unit tests
cargo test
# Integration tests (requires OPENAI_API_KEY)
cargo test --test determinism -- --ignored
cargo test --test performance -- --ignored
cargo test --test correctness -- --ignored# Development build
cargo build
# Release build
cargo build --release
# Run CLI
cargo run --bin avocado -- --help
# Run server
cargo run --bin avocado-server- Core span extraction with smart boundaries
- OpenAI embeddings integration
- Hybrid search (semantic + lexical)
- MMR diversification algorithm
- Deterministic compilation (100% verified)
- CLI tool with full features
- HTTP server
- Performance optimization (240ms avg)
- Comprehensive documentation
- Version manifest for full reproducibility
- Explain plan for retrieval debugging
- Working set diff for corpus auditing
- Smart incremental rebuild (content-hash based)
- Evaluation metrics (recall@k, MRR)
- Multi-modal support (images, code)
- Advanced retrieval (BM25, learned rankers)
- PostgreSQL support
- Framework integrations (LangChain, LlamaIndex)
- Session management
- Working set versioning
- Collaborative features
- Memory systems
We welcome contributions! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
AvocadoDB includes comprehensive test suites to validate determinism and performance:
# Run all tests and generate report
./scripts/run-tests.sh
# Run determinism validation only (100 iterations)
./scripts/test-determinism.sh
# Run performance benchmarks
./scripts/benchmark.shSee docs/testing.md for complete testing documentation.
- Quick Start Guide - Get running in 5 minutes
- Examples - Real-world usage patterns
- Testing Guide - Validation and benchmarking
- Performance Analysis
- UI Improvements
Built by the AvocadoDB Team | Making retrieval deterministic, one context at a time.