Skip to content

A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query them with full privacy and zero configuration.

License

Notifications You must be signed in to change notification settings

fork-archive-hub/Knowledge-Base-Self-Hosting-Kit

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LocalRAG: Self-Hosted RAG System for Code & Documents

A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query them with full privacy and zero configuration.

Dashboard Overview


🎯 Why This Exists

Most RAG systems treat all data the sameβ€”they chunk your Python files the same way they chunk your PDFs. This is a mistake.

LocalRAG uses context-aware ingestion:

  • Code collections use AST-based chunking that respects function boundaries
  • Document collections use semantic chunking optimized for prose
  • Separate collections prevent context pollution (your API docs don't interfere with your codebase queries)

Example:

# Ask about your docs
"What was our Q3 strategy?" β†’ queries the 'company_docs' collection

# Ask about your code  
"Show me the authentication middleware" β†’ queries the 'backend_code' collection

This separation is what makes answers actually useful.


⚑ Quick Start (5 Minutes)

Prerequisites:

  • Docker & Docker Compose
  • Ollama running locally

Setup:

# 1. Pull the embedding model
ollama pull nomic-embed-text

# 2. Clone and start
git clone https://github.com/2dogsandanerd/Knowledge-Base-Self-Hosting-Kit.git
cd Knowledge-Base-Self-Hosting-Kit
docker compose up -d

That's it. Open http://localhost:8080


πŸš€ Try It: Upload & Query (30 Seconds)

  1. Go to the Upload tab
  2. Upload any PDF or Markdown file
  3. Go to the Quicksearch tab
  4. Select your collection and ask a question

Query Interface


πŸ’‘ The Power Move: Analyze Your Own Codebase

Let's ingest this repository's backend code and query it like a wiki.

Step 1: Copy code into the data folder

# The ./data/docs folder is mounted as / in the container
cp -r backend/src data/docs/localrag_code

Step 2: Ingest via UI

  • Navigate to Folder Ingestion tab
  • Path: /localrag_code
  • Collection: localrag_code
  • Profile: Codebase (uses code-optimized chunking)
  • Click Start Ingestion

Folder Ingestion

Step 3: Query your code

  • Go to Quicksearch
  • Select localrag_code collection
  • Ask: "How does the folder ingestion work?" or "Show me the RAGClient class"

You'll get answers with direct code snippets. This is invaluable for:

  • Onboarding new developers
  • Understanding unfamiliar codebases
  • Debugging complex systems

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Your Browser (localhost:8080)            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Gateway (Nginx)                     β”‚
β”‚  - Serves static frontend                        β”‚
β”‚  - Proxies /api/* to backend                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       Backend (FastAPI + LlamaIndex)             β”‚
β”‚  - REST API for ingestion & queries              β”‚
β”‚  - Async task management                         β”‚
β”‚  - Orchestrates ChromaDB & Ollama                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚                  β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ChromaDB              β”‚  β”‚   Ollama              β”‚
β”‚  - Vector storage      β”‚  β”‚  - Embeddings         β”‚
β”‚  - Persistent on disk  β”‚  β”‚  - Answer generation  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack:

  • Backend: FastAPI, LlamaIndex 0.12.9
  • Vector DB: ChromaDB 0.5.23
  • LLM/Embeddings: Ollama (configurable)
  • Document Parser: Docling 2.13.0 (advanced OCR, table extraction)
  • Frontend: Vanilla HTML/JS (no build step)

Linux Users: If Ollama runs on your host, you may need to set OLLAMA_HOST=http://host.docker.internal:11434 in .env or use --network host.


✨ Features

  • βœ… 100% Local & Private β€” Your data never leaves your machine
  • βœ… Zero Config β€” docker compose up and you're running
  • βœ… **Batch Ingestion β€” Process multiple files (sequential processing in Community Edition)
  • βœ… Code & Doc Profiles β€” Different chunking strategies for code vs. prose
  • βœ… Smart Ingestion β€” Auto-detects file types, avoids duplicates
  • βœ… .ragignore Support β€” Works like .gitignore to exclude files/folders
  • βœ… Full REST API β€” Programmatic access for automation

🐍 API Example

import requests
import time

BASE_URL = "http://localhost:8080/api/v1/rag"

# 1. Create a collection
print("Creating collection...")
requests.post(f"{BASE_URL}/collections", json={"collection_name": "api_docs"})

# 2. Upload a document
print("Uploading README.md...")
with open("README.md", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/documents/upload",
        files={"files": ("README.md", f, "text/markdown")},
        data={"collection_name": "api_docs"},
    ).json()

task_id = response.get("task_id")
print(f"Task ID: {task_id}")

# 3. Poll for completion
while True:
    status = requests.get(f"{BASE_URL}/ingestion/ingest-status/{task_id}").json()
    print(f"Status: {status['status']}, Progress: {status['progress']}%")
    if status["status"] in ["completed", "failed"]:
        break
    time.sleep(2)

# 4. Query
print("\nQuerying...")
result = requests.post(
    f"{BASE_URL}/query",
    json={"query": "What is the killer feature?", "collection": "api_docs", "k": 3},
).json()

print("\nAnswer:")
print(result.get("answer"))

print("\nSources:")
for source in result.get("metadata", []):
    print(f"- {source.get('filename')}")

πŸ”§ Configuration

Create a .env file to customize:

# Change the public port
PORT=8090

# Swap LLM/embedding models
LLM_PROVIDER=ollama
LLM_MODEL=llama3:8b
EMBEDDING_MODEL=nomic-embed-text

# Use OpenAI/Anthropic instead
# LLM_PROVIDER=openai
# OPENAI_API_KEY=sk-...

See .env.example for all options.


πŸ‘¨β€πŸ’» Development

Hot-Reloading:
The backend uses Uvicorn's auto-reload. Edit files in backend/src and changes apply instantly.

Rebuild after dependency changes:

docker compose up -d --build backend

Project Structure:

localrag/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ api/          # FastAPI routes
β”‚   β”‚   β”œβ”€β”€ core/         # RAG logic (RAGClient, services)
β”‚   β”‚   β”œβ”€β”€ models/       # Pydantic models
β”‚   β”‚   └── main.py       # Entry point
β”‚   β”œβ”€β”€ Dockerfile
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/             # Static HTML/JS
β”œβ”€β”€ nginx/                # Reverse proxy config
β”œβ”€β”€ data/                 # Mounted volume for ingestion
└── docker-compose.yml

πŸ§ͺ Advanced: Multi-Collection Search

You can query across multiple collections simultaneously:

result = requests.post(
    f"{BASE_URL}/query",
    json={
        "query": "How do we handle authentication?",
        "collections": ["backend_code", "api_docs"],  # Note: plural
        "k": 5
    }
).json()

This is useful when answers might span code and documentation.


πŸ“Š What Makes This Different?

Feature LocalRAG Typical RAG
Code-aware chunking βœ… AST-based ❌ Fixed-size
Context separation βœ… Per-collection profiles ❌ One-size-fits-all
Self-hosted βœ… 100% local ⚠️ Often cloud-dependent
Zero config βœ… Docker Compose ❌ Complex setup
Async ingestion βœ… Background tasks ⚠️ Varies
Production-ready βœ… FastAPI + ChromaDB ⚠️ Often prototypes

🚧 Roadmap

  • Support for more LLM providers (Anthropic, Cohere)
  • Advanced reranking (Cohere Rerank, Cross-Encoder)
  • Multi-modal support (images, diagrams)
  • Graph-based retrieval for code dependencies
  • Evaluation metrics dashboard (RAGAS integration)

πŸ“œ License

MIT License. See LICENSE for details.


πŸ™ Built With


🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repo
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ’¬ Questions?


⭐ If you find this useful, please star the repo!

About

A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query them with full privacy and zero configuration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.2%
  • HTML 7.4%
  • Other 0.4%