Skip to content

Deterministic context database for AI agents. Same query → same context, every time.

License

Notifications You must be signed in to change notification settings

fork-archive-hub/avocadodb-rag

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AvocadoDB

The first deterministic context database for AI agents

Fix your RAG in 5 minutes - same query, same context, every time.

Build Status License: MIT Crates.io Docker Hub GitHub stars GitHub issues Coverage

Embedding Speed Cost

What is AvocadoDB?

AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.

Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.

The Problem with RAG

Current RAG systems are fundamentally broken:

  • ❌ Same query → different results each time (non-deterministic)
  • ❌ Token budgets wasted on duplicates (60-70% utilization)
  • ❌ No citations or verifiability
  • ❌ Hallucinations from inconsistent context
  • ❌ Slow (200-300ms just for OpenAI embedding calls)
  • ❌ Expensive (API costs scale with usage)

The AvocadoDB Solution

  • 100% Deterministic: Same query → same context, every time
  • 6x Faster: 40-60ms compilation (vs 240-360ms with OpenAI)
  • Zero Cost: Pure Rust embeddings, no API required
  • Works Offline: No internet needed after initial setup
  • Citation-Backed: Every span has exact line number citations
  • Token Efficient: 95%+ budget utilization
  • Drop-in Replacement: Works with any LLM

⚡ Performance

# Run benchmarks on your hardware
./target/release/avocado benchmark

# Results (M1 Mac example):
# Single embedding: 1.2ms  (vs ~250ms OpenAI)
# Batch of 100:     8.7ms  (vs ~250ms OpenAI)
# Full compilation: 43ms   (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)

See EMBEDDING_PERFORMANCE.md for detailed benchmarks.

Quick Start

Install from crates.io (Easiest)

cargo install avocado-cli

That's it! Now you can use avocado directly:

avocado --version
avocado init
avocado ingest ./docs --recursive
avocado compile "your query"

Docker (Recommended for Server)

Run the server with Docker:

# Run with Docker
docker run -d \
  -p 8765:8765 \
  -v avocado-data:/data \
  --name avocadodb \
  avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d

# Test the server
curl http://localhost:8765/health

See Docker Guide for complete documentation.

Installation from Source

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/avocadodb/avocadodb
cd avocadodb
cargo build --release

# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
#    - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
#    - ONNX-based, fast and efficient
#    - Model downloaded automatically on first use (~90MB)
#    - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
#      * "nomic" or "nomicv15" → 768 dimensions (good balance)
#      * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
#    - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
#    - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai

CLI Usage (Daemon by default)

# Initialize database
./target/release/avocado init

# Get model recommendation (optional)
./target/release/avocado recommend --corpus-size 5000 --use-case production
# Recommends optimal embedding model for your use case

# Ingest documents
./target/release/avocado ingest ./docs --recursive
# Output: Ingested 42 files → 387 spans

# Compile context (uses daemon at http://localhost:8765 by default)
./target/release/avocado compile "How does authentication work?" --budget 8000
# Force local mode (uses .avocado/db.sqlite in current project)
./target/release/avocado compile "How does authentication work?" --local --budget 8000

# Run performance benchmarks
./target/release/avocado benchmark
# Shows real performance on your hardware

GPU-backed server (Modal) quickstart

# Start the daemon with remote GPU embeddings (Modal)
avocado serve --gpu --embed-url https://<your-modal-endpoint>/embed
# or CPU/local (default)
avocado serve

Example Output:

Compiling context for: "How does authentication work?"
Token budget: 8000

[1] docs/authentication.md
Lines 1-23

# Authentication System

Our authentication uses JWT tokens with secure refresh mechanisms...

---

[2] src/middleware/auth.ts
Lines 45-78

export function authenticateRequest(req: Request) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) throw new UnauthorizedError();
  ...
}

---

Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)

Python SDK

cd sdks/python
pip install -e .
from avocado import AvocadoDB

db = AvocadoDB()
db.ingest("./docs", recursive=True)

result = db.compile("my query", budget=8000)
print(result.text)  # Deterministic every time

TypeScript SDK

cd sdks/typescript
npm install
npm run build
import { AvocadoDB } from 'avocadodb';

const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);

const result = await db.compile('my query', { budget: 8000 });
console.log(result.text);  // Deterministic every time

HTTP Server (Multi-project daemon)

# Start server (binds to 127.0.0.1 by default)
./target/release/avocado-server

# Use the API
curl -X POST http://localhost:8765/compile \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "token_budget": 8000, "project": "'"$PWD"'"}'

Docker & Kubernetes Deployment

AvocadoDB is production-ready with full Docker and Kubernetes support.

Docker

# Quick start with Docker
docker run -d -p 8765:8765 -v avocado-data:/data avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d

Features:

  • Multi-stage build for minimal image size (~80-100MB)
  • Multi-architecture support (linux/amd64, linux/arm64)
  • Non-root user for security
  • Health checks built-in
  • Configurable via environment variables

See Docker Guide for complete documentation.

Kubernetes

# Deploy to Kubernetes
kubectl apply -k k8s/

# Verify deployment
kubectl get pods -l app=avocadodb

Includes:

  • Production-ready Deployment manifests
  • Horizontal scaling support
  • Persistent storage configuration
  • Ingress with TLS/HTTPS
  • ConfigMaps and Secrets management
  • Resource limits and health checks

See Kubernetes Guide for complete documentation.

Environment Variables

Variable Default Description
PORT 8765 HTTP server port
BIND_ADDR 127.0.0.1 Bind address (set 0.0.0.0 to expose publicly)
RUST_LOG info Log level
AVOCADODB_EMBEDDING_MODEL minilm Embedding model (minilm, nomic, bgelarge)
AVOCADODB_EMBEDDING_PROVIDER local Provider (local or openai)
OPENAI_API_KEY - OpenAI API key (if using OpenAI)
AVOCADODB_ROOT unset Optional project root. When set, all project paths must be under this directory. Requests outside are rejected.
API_TOKEN unset If set, requires header X-Avocado-Token to be present and equal for all routes (except /health, /api-docs/*).
MAX_BODY_BYTES 2097152 (2MB) Request body size limit to protect against large payloads.

Security note:

  • Do not expose the server publicly without protection. If you must, set BIND_ADDR=0.0.0.0 and front it with auth.
  • For local safety, clients always send an explicit project (their current working directory), and the server normalizes paths and can restrict to AVOCADODB_ROOT.

How It Works

Architecture

Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
      → MMR Diversification → Token Packing → Deterministic Sort → WorkingSet

Key Innovations

  1. Span-Based Indexing: Documents are split into spans (20-50 lines) with precise line numbers
  2. Hybrid Retrieval: Combines semantic (vector) and lexical (keyword) search
  3. Deterministic Ordering: Results sorted by (artifact_id, start_line) for reproducibility
  4. Greedy Token Packing: Maximizes token budget utilization without duplicates

Explainability & Reproducibility (v2.1)

NEW in v2.1: Enhanced determinism, explainability, and quality tracking features based on production feedback.

Version Manifest

Every compilation now includes a version manifest for full reproducibility:

// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap();
println!("Avocado version: {}", manifest.avocado_version);
println!("Embedding model: {}", manifest.embedding_model);
println!("Context hash: {}", manifest.context_hash);

The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.

Explain Plan

Understand exactly how context was selected with explain mode:

# CLI with explain
avocado compile "authentication" --explain

# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order
# Python SDK
result = db.compile("auth", budget=8000, explain=True)
if result.explain:
    print(f"Semantic candidates: {len(result.explain.semantic_candidates)}")
    print(f"Final spans: {len(result.explain.final_order)}")

Working Set Diff

Compare retrieval results across corpus versions for auditing:

use avocado_core::{diff_working_sets, summarize_diff};

let diff = diff_working_sets(&before, &after);
println!("{}", summarize_diff(&diff));
// Output: "3 added, 1 removed, 2 reranked"

Smart Incremental Rebuild

Only re-embed changed files - unchanged content is automatically skipped:

# First ingest
avocado ingest ./docs --recursive
# Ingested 42 files → 387 spans

# Re-ingest after editing 3 files
avocado ingest ./docs --recursive
# Skipped 39 unchanged, Updated 3 files → 28 spans

Content-hash comparison ensures minimal re-embedding while keeping the index fresh.

Evaluation Metrics

Built-in support for golden set testing and quality metrics:

use avocado_core::{GoldenQuery, evaluate};

let queries = vec![
    GoldenQuery {
        query: "authentication".to_string(),
        expected_paths: vec!["docs/auth.md".to_string()],
        k: 10,
    },
];

let summary = evaluate(&queries, &db, &index, &config).await?;
println!("Recall@10: {:.2}%", summary.mean_recall * 100.0);
println!("MRR: {:.3}", summary.mean_mrr);

Session Management

NEW in v2.0: Multi-turn conversation tracking with context compilation

AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.

Quick Example

from avocado import AvocadoDB

db = AvocadoDB(mode="http")

# Create a session
session = db.create_session(user_id="alice", title="Project Q&A")

# Multi-turn conversation
result = session.compile("What is AvocadoDB?", budget=8000)
session.add_message("assistant", "AvocadoDB is a deterministic context database...")

result2 = session.compile("How does the compiler work?")
session.add_message("assistant", "The compiler uses hybrid search...")

# Get conversation history
history = session.get_history()

# Replay for debugging
replay = session.replay()

Features

  • Multi-turn conversations: Track user queries and agent responses
  • Context compilation: Automatically compile context for each query
  • Conversation history: Retrieve formatted history with token limiting
  • Session replay: Debug agent behavior by replaying entire sessions
  • Persistence: Sessions stored in SQLite with full ACID guarantees

Available in

  • Python SDK: Full session support with Session class
  • TypeScript SDK: Complete session management API
  • CLI: Session commands for interactive use
  • HTTP API: RESTful endpoints for all session operations

See SESSION_MANAGEMENT.md for complete documentation.

Why Determinism Matters

When RAG systems return different context for the same query:

  • LLMs produce inconsistent answers
  • Users can't verify results
  • Debugging is impossible
  • Trust is broken

AvocadoDB fixes this with deterministic compilation - same query, same context, every time.

Verify Determinism Yourself

# Run the same query multiple times
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

# Same hash every single time! ✅

Performance

Phase 1 achieves production-ready performance:

Metric Target Actual Status
Compilation time (8K tokens) < 500ms ~50ms avg ✅ 10x faster
Token budget utilization > 95% 90-95% ✅ Excellent
Determinism 100% 100% ✅ Perfect
Duplicate spans 0 0 ✅ Perfect

Breakdown for 8K token budget compilation (with Pure Rust embeddings):

Embed query:          1-5ms      (2-5% of total) - Pure Rust (fastembed), local
Semantic search:      <1ms       (Vector similarity, HNSW)
Lexical search:       <1ms       (SQL LIKE query)
Hybrid fusion:        <1ms       (RRF score combination)
MMR diversification:  5-10ms     (Diversity selection)
Token packing:        <1ms       (Greedy budget allocation)
Deterministic sort:   <1ms       (Stable sort)
Build context:        <1ms       (Text concatenation)
Count tokens:         30-40ms    (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL:                40-60ms    (6x faster than OpenAI!)

Performance Comparison:

Metric Pure Rust (fastembed) OpenAI API
Query Embedding 1-5ms 200-300ms
Total Compilation 40-60ms 240-360ms
Throughput 200-1000 texts/sec 3-5 batches/sec
Cost Free ~$0.0001/1K tokens
Rate Limits None Varies by tier
Offline ✅ Yes ❌ No
Quality Good (384 dims) Excellent (1536 dims)

Pure Rust embeddings are 6x faster and completely free! Optimization: All algorithms run in <15ms total (highly optimized)

See docs/performance.md for detailed analysis and scaling characteristics.

CLI Reference

avocado init

Initialize a new AvocadoDB database:

avocado init [--path <db-path>]

Creates .avocado/ directory with SQLite database and vector index.

avocado ingest

Ingest documents into the database:

avocado ingest <path> [--recursive]

Examples:

# Ingest single file
avocado ingest README.md

# Ingest entire directory recursively
avocado ingest docs/ --recursive

# Ingest specific file types
avocado ingest src/ --recursive --include "*.rs,*.md,*.toml"

The ingestion process:

  1. Reads document content
  2. Extracts spans (20-50 lines with smart boundaries)
  3. Generates embeddings for each span (local fastembed by default)
  4. Stores in SQLite database

avocado compile

Compile a deterministic context for a query:

avocado compile <query> [OPTIONS]

Options:

  • --budget <tokens>: Token budget (default: 8000)
  • --json: Output as JSON instead of human-readable format
  • --explain: Show explain plan with candidates at each pipeline stage
  • --mmr-lambda <0.0-1.0>: MMR diversity parameter (default: 0.5)
    • Higher values (0.7-1.0) = more relevant but potentially redundant
    • Lower values (0.0-0.3) = more diverse but potentially less relevant
  • --semantic-weight <float>: Semantic search weight (default: 0.7)
  • --lexical-weight <float>: Lexical search weight (default: 0.3)

Examples:

# Basic compilation
avocado compile "How does authentication work?"

# Large context window
avocado compile "error handling patterns" --budget 16000

# Prioritize diversity over relevance
avocado compile "testing strategies" --mmr-lambda 0.3

# Tune search weights (more keyword matching)
avocado compile "API endpoints" --semantic-weight 0.5 --lexical-weight 0.5

# JSON output for programmatic use
avocado compile "authentication" --budget 8000 --json

JSON Output Format:

{
  "text": "[1] docs/auth.md\nLines 1-23\n\n# Authentication...",
  "spans": [
    {
      "id": "uuid",
      "artifact_id": "uuid",
      "start_line": 1,
      "end_line": 23,
      "text": "# Authentication...",
      "embedding": [0.002, 0.013, ...],
      "embedding_model": "text-embedding-ada-002",
      "token_count": 127,
      "metadata": null
    }
  ],
  "citations": [
    {
      "span_id": "uuid",
      "artifact_id": "uuid",
      "artifact_path": "docs/auth.md",
      "start_line": 1,
      "end_line": 23,
      "score": 0.0
    }
  ],
  "tokens_used": 2232,
  "query": "authentication",
  "compilation_time_ms": 243
}

avocado stats

Show database statistics:

avocado stats

Example output:

Database Statistics:
  Artifacts: 42
  Spans: 387
  Total Tokens: 125,431
  Average Tokens/Span: 324

avocado clear

Clear all data from the database:

avocado clear

Warning: This permanently deletes all ingested documents and embeddings!

Library Usage (Rust)

Use AvocadoDB as a library in your Rust projects:

[dependencies]
avocado-core = "2.1"
tokio = { version = "1.35", features = ["full"] }
use avocado_core::{Database, VectorIndex, compiler, types::CompilerConfig};

#[tokio::main]
async fn main() -> avocado_core::types::Result<()> {
    // Open database
    let db = Database::new(".avocado/db.sqlite")?;

    // Load vector index from database
    let index = VectorIndex::from_database(&db)?;

    // Configure compilation
    let config = CompilerConfig {
        token_budget: 8000,
        semantic_weight: 0.7,
        lexical_weight: 0.3,
        mmr_lambda: 0.5,
        enable_mmr: true,
    };

    // Compile context
    let working_set = compiler::compile(
        "How does authentication work?",
        config,
        &db,
        &index,
        Some("your-openai-api-key")
    ).await?;

    println!("Compiled {} spans using {} tokens",
        working_set.spans.len(),
        working_set.tokens_used
    );

    println!("Deterministic hash: {}", working_set.deterministic_hash());

    // Use working_set.text in your LLM prompt
    println!("Context:\n{}", working_set.text);

    Ok(())
}

Development

Project Structure

avocadodb/
├── avocado-core/      # Core engine (Rust)
├── avocado-cli/       # Command-line tool
├── avocado-server/    # HTTP server
├── python/            # Python SDK
├── migrations/        # Database schema
├── tests/             # Integration tests
└── docs/              # Documentation

Running Tests

# Unit tests
cargo test

# Integration tests (requires OPENAI_API_KEY)
cargo test --test determinism -- --ignored
cargo test --test performance -- --ignored
cargo test --test correctness -- --ignored

Building

# Development build
cargo build

# Release build
cargo build --release

# Run CLI
cargo run --bin avocado -- --help

# Run server
cargo run --bin avocado-server

Roadmap

Phase 1 ✅ (Complete)

  • Core span extraction with smart boundaries
  • OpenAI embeddings integration
  • Hybrid search (semantic + lexical)
  • MMR diversification algorithm
  • Deterministic compilation (100% verified)
  • CLI tool with full features
  • HTTP server
  • Performance optimization (240ms avg)
  • Comprehensive documentation

Phase 2 - Advanced Features

  • Version manifest for full reproducibility
  • Explain plan for retrieval debugging
  • Working set diff for corpus auditing
  • Smart incremental rebuild (content-hash based)
  • Evaluation metrics (recall@k, MRR)
  • Multi-modal support (images, code)
  • Advanced retrieval (BM25, learned rankers)
  • PostgreSQL support
  • Framework integrations (LangChain, LlamaIndex)

Phase 3 - Agent Memory

  • Session management
  • Working set versioning
  • Collaborative features
  • Memory systems

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Testing

AvocadoDB includes comprehensive test suites to validate determinism and performance:

# Run all tests and generate report
./scripts/run-tests.sh

# Run determinism validation only (100 iterations)
./scripts/test-determinism.sh

# Run performance benchmarks
./scripts/benchmark.sh

See docs/testing.md for complete testing documentation.

Learn More


Built by the AvocadoDB Team | Making retrieval deterministic, one context at a time.

About

Deterministic context database for AI agents. Same query → same context, every time.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 43.6%
  • Python 41.2%
  • TypeScript 8.9%
  • Shell 5.0%
  • Other 1.3%